Microsoft and OpenAI Plot $100 Billion Stargate AI Supercomputer
微软和 OpenAI 计划投资 1000 亿美元打造星际门 AI 超级计算机

OpenAI 首席执行官萨姆·阿尔特曼(Sam Altman),左,和微软首席执行官萨提亚·纳德拉(Satya Nadella)。照片来源:Getty。艺术:肖恩·伯克(Shane Burke)


Executives at Microsoft and OpenAI have been drawing up plans for a data center project that would contain a supercomputer with millions of specialized server chips to power OpenAI’s artificial intelligence, according to three people who have been involved in the private conversations about the proposal. The project could cost as much as $100 billion, according to a person who spoke to OpenAI CEO Sam Altman about it and a person who has viewed some of Microsoft’s initial cost estimates.
微软和 OpenAI 的高管们正在制定一个数据中心项目的计划,该项目将包含一个超级计算机,配备数百万个专用服务器芯片,以支持 OpenAI 的人工智能,三位参与该提案私下讨论的人士透露。根据一位与 OpenAI 首席执行官山姆·阿尔特曼(Sam Altman)交谈过的人和一位查看过微软初步成本估算的人士,该项目的成本可能高达 1000 亿美元。
Microsoft would likely be responsible for financing the project, which would be 100 times more costly than some of today’s biggest data centers, demonstrating the enormous investment that may be needed to build computing capacity for AI in the coming years. Executives envisage the proposed U.S.-based supercomputer, which they have referred to as “Stargate,” as the biggest of a series of installations the companies are looking to build over the next six years.
微软可能会负责为该项目融资,该项目的成本将是目前一些最大数据中心的 100 倍,这表明在未来几年内,为人工智能构建计算能力可能需要巨额投资。高管们设想的这台位于美国的超级计算机,他们称之为“星际之门”,将是公司计划在未来六年内建设的一系列设施中最大的一台。
The Takeaway 要点
• Microsoft executives are looking to launch Stargate as soon as 2028• 微软高管计划在 2028 年尽快推出 Stargate
• The supercomputer would require an unprecedented amount of power
• 这台超级计算机将需要前所未有的电力
• OpenAI’s next major AI upgrade is expected to land by early next year
• OpenAI 的下一个重大 AI 升级预计将在明年初推出
While project has not been green-lit and the plans could change, they provide a peek into this decade’s most important tech industry tie-up and how far ahead the two companies are thinking. Microsoft so far has committed more than $13 billion to OpenAI so the startup can use Microsoft data centers to power ChatGPT and the models behind its conversational AI. In exchange, Microsoft gets access to the secret sauce of OpenAI’s technology and the exclusive right to resell that tech to its own cloud customers, such as Morgan Stanley. Microsoft also has baked OpenAI’s software into new AI Copilot features for Office, Teams and Bing.
尽管该项目尚未获得批准,计划可能会发生变化,但它们提供了对本十年最重要的科技行业合作的初步了解,以及两家公司前瞻性的思考。到目前为止,微软已承诺向 OpenAI 投资超过 130 亿美元,以便该初创公司可以利用微软的数据中心来支持 ChatGPT 及其对话式人工智能背后的模型。作为交换,微软获得了 OpenAI 技术的核心秘密,并拥有将该技术独家转售给其云客户(如摩根士丹利)的权利。微软还将 OpenAI 的软件整合到 Office、Teams 和 Bing 的新 AI Copilot 功能中。
Microsoft’s willingness to go ahead with the Stargate plan depends in part on OpenAI’s ability to meaningfully improve the capabilities of its AI, one of these people said. OpenAI last year failed to deliver a new model it had promised to Microsoft, showing how difficult the AI frontier can be to predict. Still, OpenAI CEO Sam Altman has said publicly that the main bottleneck holding up better AI is a lack of sufficient servers to develop it.
微软是否愿意推进 Stargate 计划在一定程度上取决于 OpenAI 是否能够有效提升其人工智能的能力,这位知情人士表示。OpenAI 去年未能向微软交付其承诺的新模型,显示出人工智能前沿的不可预测性。然而,OpenAI 首席执行官山姆·阿尔特曼公开表示,阻碍更好人工智能发展的主要瓶颈是缺乏足够的服务器。
If Stargate moves forward, it would produce orders of magnitude more computing power than what Microsoft currently supplies to OpenAI from data centers in Phoenix and elsewhere, these people said. The proposed supercomputer would also require at least several gigawatts of power—equivalent to what’s needed to run at least several large data centers today, according to two of these people. Much of the project cost would lie in procuring the chips, two of the people said, but acquiring enough energy sources to run it could also be a challenge.
如果星际之门项目推进,它将产生比微软目前从凤凰城及其他地方的数据中心提供给 OpenAI 的计算能力高几个数量级,这些人表示。提议中的超级计算机还需要至少几个千兆瓦的电力——相当于今天运行至少几个大型数据中心所需的电力,根据这两位人士的说法。项目的大部分成本将用于采购芯片,这两位人士表示,但获取足够的能源来源来运行它也可能是一个挑战。
Such a project is “absolutely required” for artificial general intelligence—AI that can accomplish most of the computing tasks humans do, said Chris Sharp, chief technology officer of Digital Realty, a data center operator that hasn’t been involved in Stargate. Though the project’s scale seems unimaginable by today’s standard, he said that by the time such a supercomputer is finished, the numbers won’t seem as eye-popping.
这样的项目对于人工通用智能“绝对必要”——能够完成大多数人类计算任务的人工智能,数字房地产的首席技术官克里斯·夏普表示,该公司是一家未参与星际之门的数据中心运营商。尽管该项目的规模在今天的标准下似乎难以想象,但他说,到这样一台超级计算机完成时,这些数字看起来就不会那么令人震惊。

位于凤凰城附近的一个与 OpenAI 无关的微软数据中心。图片来源:微软
The executives have discussed launching Stargate as soon as 2028 and expanding it through 2030, possibly needing as much as 5 gigawatts of power by the end, the people involved in the discussions said.
高管们讨论了在 2028 年尽快启动 Stargate,并计划在 2030 年之前扩展,参与讨论的人士表示,到那时可能需要多达 5 吉瓦的电力。
Phase Five 第五阶段
Altman and Microsoft employees have talked about these supercomputers in terms of five phases, with phase 5 being Stargate, named for a science fiction film in which scientists develop a device for traveling between galaxies. (The codename originated with OpenAI but isn’t the official project codename that Microsoft is using, said one person who has been involved.)
阿尔特曼和微软员工将这些超级计算机分为五个阶段,第五阶段被称为“星际之门”,这个名字来源于一部科幻电影,电影中科学家们开发了一种在星系之间旅行的设备。(这个代号最初是由 OpenAI 提出的,但并不是微软正在使用的官方项目代号,一位参与者表示。)
The phase prior to Stargate would cost far less. Microsoft is working on a smaller, phase 4 supercomputer for OpenAI that it aims to launch around 2026, according to two of the people. Executives have planned to build it in Mt. Pleasant, Wisc., where the Wisconsin Economic Development Corporation recently said Microsoft broke ground on a $1 billion data center expansion. The supercomputer and data center could eventually cost as much as $10 billion to complete, one of these people said. That’s many times more than the cost of existing data centers. Microsoft also has discussed using Nvidia-made AI chips for that project, said a different person who has been involved in the conversations.
在星际之门之前的阶段成本将远低于此。微软正在为 OpenAI 开发一台更小的第四阶段超级计算机,计划于 2026 年左右推出,来自两位知情人士的消息称。高管们计划在威斯康星州的普莱森特山建设该计算机,威斯康星经济发展公司最近表示,微软已在一项 10 亿美元的数据中心扩建项目上破土动工。这台超级计算机和数据中心的最终成本可能高达 100 亿美元,这位知情人士表示。这是现有数据中心成本的多倍。微软还讨论了在该项目中使用英伟达制造的人工智能芯片,另一位参与谈判的人士表示。
Today, Microsoft and OpenAI are in the middle of phase 3 of the five-phase plan. Much of the cost of the next two phases will involve procuring the AI chips. Two data center practitioners who aren’t involved in the project said it’s common for AI server chips to make up around half of the total initial cost of AI-focused data centers other companies are currently building.
今天,微软和 OpenAI 正处于五阶段计划的第三阶段。接下来的两个阶段的大部分成本将涉及采购 AI 芯片。两位不参与该项目的数据中心从业者表示,AI 服务器芯片通常占其他公司目前正在建设的以 AI 为重点的数据中心总初始成本的约一半。
All up, the proposed efforts could cost in excess of $115 billion, more than three times what Microsoft spent last year on capital expenditures for servers, buildings and other equipment. Microsoft was on pace to spend around $50 billion this year, assuming it continues the pace of capital expenditures it disclosed in the second half of 2023. Microsoft CFO Amy Hood said in January that such spending will increase “materially” in the coming quarters, driven by investments in “cloud and AI infrastructure.”
总体而言,提议的努力可能会花费超过 1150 亿美元,超过微软去年在服务器、建筑和其他设备上的资本支出三倍多。假设微软继续在 2023 年下半年披露的资本支出速度,今年预计将花费约 500 亿美元。微软首席财务官艾米·胡德在 1 月份表示,这种支出将在未来几个季度“显著”增加,主要受“云和人工智能基础设施”投资的推动。
Frank Shaw, a Microsoft spokesperson, did not comment about the supercomputing plans but said in a statement: “We are always planning for the next generation of infrastructure innovations needed to continue pushing the frontier of AI capability.” An OpenAI spokesperson did not have a comment for this article.
微软发言人弗兰克·肖没有对超级计算计划发表评论,但在一份声明中表示:“我们始终在为下一代基础设施创新进行规划,以继续推动人工智能能力的前沿。” OpenAI 的发言人对此文章没有评论。
Altman has said privately that Google, one of OpenAI’s biggest rivals, will have more computing capacity than OpenAI in the near term, and publicly he has complained about not having as many AI server chips as he’d like.
阿尔特曼私下表示,谷歌作为 OpenAI 最大的竞争对手之一,在短期内将拥有比 OpenAI 更多的计算能力,而他公开抱怨没有足够多的 AI 服务器芯片。
That’s one reason he has been pitching the idea of a new server chip company that would develop a chip rivaling Nvidia’s graphics processing unit, which today powers OpenAI’s software. Demand for Nvidia GPU servers has skyrocketed, driving up costs for customers such as Microsoft and OpenAI. Besides controlling costs, Microsoft has other potential reasons to support Altman’s alternative chip. The GPU boom has put Nvidia in the position of kingmaker as it decides which customers can have the most chips, and it has aided small cloud providers that compete with Microsoft. Nvidia has also muscled into reselling cloud servers to its own customers.
这就是他一直在推销一个新服务器芯片公司的想法的原因之一,该公司将开发一款与 Nvidia 的图形处理单元相媲美的芯片,而该芯片目前为 OpenAI 的软件提供支持。对 Nvidia GPU 服务器的需求激增,导致微软和 OpenAI 等客户的成本上升。除了控制成本,微软还有其他潜在原因支持 Altman 的替代芯片。GPU 热潮使 Nvidia 处于决定权的地位,因为它决定了哪些客户可以获得最多的芯片,并且它还帮助了与微软竞争的小型云服务提供商。Nvidia 还强势进入了向其客户转售云服务器的市场。
With or without Microsoft, Altman’s effort would require significant investments in power and data centers to accompany the chips. Stargate is designed to give Microsoft and OpenAI the option of using GPUs made by companies other than Nvidia, such as Advanced Micro Devices, or even an AI server chip Microsoft recently launched, said the people who have been involved in the discussions. It isn’t clear whether Altman believes the theoretical GPUs he aims to develop in the coming years will be ready for Stargate.
无论是否有微软的参与,奥特曼的努力都需要在电力和数据中心方面进行重大投资,以配合这些芯片。星际之门的设计旨在为微软和 OpenAI 提供使用其他公司(如超威半导体)制造的 GPU 的选项,甚至是微软最近推出的 AI 服务器芯片,参与讨论的人士表示。目前尚不清楚奥特曼是否相信他计划在未来几年开发的理论 GPU 能够为星际之门做好准备。
The total cost of the Stargate supercomputer could depend on software and hardware improvements that make data centers more efficient over time. The companies have discussed the possibility of using alternative power sources, such as nuclear energy, according to one of the people involved. (Amazon just purchased a Pennsylvania data center site with access to nuclear power. Microsoft also had discussed bidding on the site, according to two people involved in the talks.) Altman himself has said that developing superintelligence will likely require a significant energy breakthrough.
Stargate 超级计算机的总成本可能取决于软件和硬件的改进,这些改进使数据中心随着时间的推移变得更加高效。根据参与者之一的说法,这些公司讨论了使用替代能源的可能性,例如核能。(亚马逊刚刚购买了一个位于宾夕法尼亚州的数据中心地点,该地点可以接入核能。根据两位参与谈判的人士的说法,微软也曾讨论过对该地点进行竞标。)阿尔特曼本人表示,开发超智能可能需要重大的能源突破。
Packed Racks 打包货架
To make Stargate a reality, Microsoft also would have to overcome several technical challenges, the two people said. For instance, the current proposed design calls for putting many more GPUs into a single rack than Microsoft is used to, to increase the chips’ efficiency and performance. Because of the higher density of GPUs, Microsoft would also need to come up with a way to prevent the chips from overheating, they said.
为了使星际之门成为现实,微软还必须克服几个技术挑战,这两位人士表示。例如,目前提议的设计要求在单个机架中放置比微软习惯使用的更多的 GPU,以提高芯片的效率和性能。由于 GPU 的密度更高,微软还需要想出一种方法来防止芯片过热,他们说。
Microsoft and OpenAI are also debating which cables they will use to string the millions of GPUs together. The networking cables are crucial for moving large amounts of data in and out of server chips quickly. OpenAI has told Microsoft it doesn’t want to use Nvidia’s proprietary InfiniBand cables in the Stargate supercomputer, even though Microsoft currently uses the Nvidia cables in its existing supercomputers, according to two people who were involved in the discussions. (OpenAI instead wants to use more generic Ethernet cables.) Switching away from InfiniBand could make it easier for OpenAI and Microsoft to lessen their reliance on Nvidia down the line.
微软和 OpenAI 也在讨论他们将使用哪些电缆将数百万个 GPU 连接在一起。网络电缆对于快速进出服务器芯片的大量数据传输至关重要。根据两位参与讨论的人士,OpenAI 已告知微软,它不想在 Stargate 超级计算机中使用 Nvidia 的专有 InfiniBand 电缆,尽管微软目前在其现有的超级计算机中使用 Nvidia 电缆。(OpenAI 则希望使用更通用的以太网电缆。)放弃 InfiniBand 可能会使 OpenAI 和微软在未来减少对 Nvidia 的依赖变得更容易。
AI computing is more expensive and complex than traditional computing, which is why companies closely guard the details about their AI data centers, including how GPUs are connected and cooled. For his part, Nvidia CEO Jensen Huang has said companies and countries will need to build $1 trillion worth of new data centers in the next four to five years to handle all of the AI computing that’s coming.
人工智能计算比传统计算更昂贵和复杂,这就是为什么公司对其人工智能数据中心的细节保持高度保密,包括 GPU 的连接和冷却方式。英伟达首席执行官黄仁勋表示,各公司和国家在未来四到五年内需要建设价值 1 万亿美元的新数据中心,以应对即将到来的所有人工智能计算。
Microsoft and OpenAI executives have been discussing the data center project since at least last summer. Besides CEO Satya Nadella and Chief Technology Officer Kevin Scott, other Microsoft managers who have been involved in the supercomputer talks have included Pradeep Sindhu, who leads strategy for the way Microsoft stitches together AI server chips in its data centers, and Brian Harry, who helps develop AI hardware for the Azure cloud server unit, according to people who have worked with them.
微软和 OpenAI 的高管们自去年夏天以来就一直在讨论数据中心项目。除了首席执行官萨提亚·纳德拉和首席技术官凯文·斯科特,其他参与超级计算机谈判的微软管理人员还包括负责微软在数据中心中整合 AI 服务器芯片战略的普拉迪普·辛杜,以及帮助开发 Azure 云服务器单元的 AI 硬件的布赖恩·哈里,据与他们合作过的人士透露。

OpenAI 总裁 Greg Brockman(左)和微软首席技术官 Kevin Scott。照片来源:YouTube/Microsoft Developer
The partners are still ironing out several key details, which they might not finalize anytime soon. It is unclear where the supercomputer will be physically located and whether it will be built inside one data center or multiple data centers in close proximity. Clusters of GPUs tend to work more efficiently when they are located in the same data center.
合作伙伴仍在解决几个关键细节,他们可能不会很快敲定。尚不清楚超级计算机将实际位于何处,以及它是会建在一个数据中心内还是多个相邻的数据中心内。GPU 集群在同一个数据中心内工作时通常效率更高。
OpenAI has already pushed the boundaries of what Microsoft can do with data centers. After making its initial investment in the startup in 2019, Microsoft built its first GPU supercomputer, containing thousands of Nvidia GPUs, to handle OpenAI’s computing demands, spending $1.2 billion on the system over several years. This year and next year, Microsoft has planned to provide OpenAI with servers housing hundreds of thousands of GPUs in total, said a person with knowledge of its computing needs.
OpenAI 已经推动了微软在数据中心方面的能力极限。在 2019 年对这家初创公司的首次投资后,微软建立了其首个 GPU 超级计算机,包含数千个 Nvidia GPU,以满足 OpenAI 的计算需求,花费了 12 亿美元用于该系统,历时数年。今年和明年,微软计划为 OpenAI 提供总共数十万 GPU 的服务器,一位了解其计算需求的人士表示。
The Next Barometer: GPT-5
下一个晴雨表:GPT-5
Microsoft and OpenAI’s grand designs for world-beating data centers depend almost entirely on whether OpenAI can help Microsoft justify the investment in those projects by taking major strides toward superintelligence—AI that can help solve complex problems such as cancer, fusion, global warming or colonizing Mars. Such attainments may be a far-off dream. While some consumers and professionals have embraced ChatGPT and other conversational AI as well as AI-generated video, turning these recent breakthroughs into technology that produces significant revenue could take longer than practitioners in the field anticipated. Firms including Amazon and Google have quietly tempered expectations for sales, in part because such AI is costly and requires a lot of work to launch inside large enterprises or to power new features in apps used by millions of people.
微软和 OpenAI 对世界领先数据中心的宏伟设计几乎完全依赖于 OpenAI 能否帮助微软为这些项目的投资提供合理依据,通过在超级智能方面取得重大进展——即能够帮助解决复杂问题的人工智能,如癌症、聚变、全球变暖或殖民火星。这些成就可能是一个遥远的梦想。虽然一些消费者和专业人士已经接受了 ChatGPT 和其他对话式人工智能以及 AI 生成的视频,但将这些最新突破转化为能够产生显著收入的技术可能比该领域的从业者预期的要花费更长的时间。包括亚马逊和谷歌在内的公司已经悄然降低了销售预期,部分原因是这种人工智能成本高昂,并且在大型企业内部启动或为数百万用户使用的应用程序提供新功能需要大量工作。
Altman said at an Intel event last month that AI models get “predictably better” when researchers throw more computing power at them. OpenAI has published research on this topic, which it refers to as the “scaling laws” of conversational AI.
阿尔特曼在上个月的英特尔活动上表示,当研究人员投入更多计算能力时,人工智能模型会“可预测地变得更好”。OpenAI 已就此主题发布了研究,称其为对话式人工智能的“扩展法则”。
OpenAI “throwing ever more compute [power to scale up existing AI] risks leading to a ‘trough of disillusionment’” among customers as they realize the limits of the technology, said Ali Ghodsi, CEO of Databricks, which helps companies use AI. “We should really focus on making this technology useful for humans and enterprises. That takes time. I believe it’ll be amazing, but [it] doesn’t happen overnight.”
OpenAI“投入越来越多的计算[能力以扩大现有的人工智能]可能会导致客户产生‘失望的低谷’”,Databricks 的首席执行官 Ali Ghodsi 表示,该公司帮助企业使用人工智能。“我们应该真正专注于使这项技术对人类和企业有用。这需要时间。我相信这将是惊人的,但[它]不会一蹴而就。”
The stakes are high for OpenAI to prove that its next major conversational AI, known as a large language model, is significantly better than GPT-4, its most advanced LLM today. OpenAI released GPT-4 a year ago, and Google has released a comparable model in the meantime as it tries to catch up. OpenAI aims to release its next major LLM upgrade by early next year, said one person with knowledge of the process. It could release more incremental improvements to LLMs before then, this person said.
OpenAI 面临着巨大的压力,必须证明其下一个主要对话式人工智能,即大型语言模型,显著优于其目前最先进的 GPT-4。OpenAI 在一年前发布了 GPT-4,而谷歌在此期间发布了一个可比的模型,试图迎头赶上。知情人士表示,OpenAI 计划在明年初发布其下一个主要升级。该人士还表示,在此之前,OpenAI 可能会发布更多渐进式的改进。
With more servers available, some OpenAI leaders believe the company can use its existing AI and recent technical breakthroughs such as Q*—a model that can reason about math problems it hasn’t previously been trained to solve—to create the right synthetic (non–human-generated) data for training better models after running out of human-generated data to give them. These models may also be able to figure out the flaws in existing models like GPT-4 and suggest technical improvements—in other words, self-improving AI.
随着可用服务器的增加,一些 OpenAI 领导者认为公司可以利用现有的人工智能和最近的技术突破,例如 Q*——一种能够推理未曾训练过的数学问题的模型——来创建合适的合成(非人类生成)数据,以便在耗尽人类生成的数据后训练更好的模型。这些模型也可能能够找出现有模型(如 GPT-4)的缺陷并提出技术改进建议——换句话说,自我改进的人工智能。
Anissa Gardizy is a reporter at The Information covering cloud computing. She was previously a tech reporter at The Boston Globe. Anissa is based in San Francisco and can be reached at anissa@theinformation.com or on Twitter at @anissagardizy8
安妮莎·加尔迪齐是《信息》杂志的记者,负责云计算报道。她之前是《波士顿环球报》的科技记者。安妮莎常驻旧金山,可以通过 anissa@theinformation.com 或在推特上联系她,用户名为@anissagardizy8。
Amir Efrati is executive editor at The Information, which he helped to launch in 2013. Previously he spent nine years as a reporter at the Wall Street Journal, reporting on white-collar crime and later about technology. He can be reached at amir@theinformation.com and is on Twitter @amir
阿米尔·埃夫拉提是《信息》的执行编辑,他在 2013 年帮助创办了该刊物。此前,他在《华尔街日报》担任记者九年,报道白领犯罪,后来转向科技领域。他的邮箱是 amir@theinformation.com,推特账号是@amir。