Science & technology | The mother of invention
科学与技术 |发明之母

China’s AI firms are cleverly innovating around chip bans
中国的人工智能公司正在围绕芯片禁令进行巧妙的创新

Tweaks to software blunt the shortage of powerful hardware
对软件的调整缓解了强大硬件的短缺

An illustration of a yellow dragon emerging from a microchip-shaped hole against a red background. — Illustration: Ben Hickey 插图：Ben Hickey

Sep 19th 2024 9月 19th 2024

Listen to this story.

Enjoy more audio and podcasts on iOS or Android.

听听这个故事。

在 iOS 或 Android 上享受更多音频和播客。

TODAY’S TOP artificial-intelligence (AI) models rely on large numbers of cutting-edge processors known as graphics processing units (GPUs). Most Western companies have no trouble acquiring them. Llama 3, the newest model from Meta, a social-media giant, was trained on 16,000 H100 GPUs from Nvidia, an American chipmaker. Meta plans to stockpile 600,000 more before year’s end. XAI, a startup backed by Elon Musk, has built a data centre in Memphis powered by 100,000 H100s. And though OpenAI, the other big model-maker, is tight-lipped about its GPU stash, it had its latest processors hand-delivered by Jensen Huang, Nvidia’s boss, in April.
TODAY 的顶级人工智能（AI）模型依赖于大量称为图形处理单元（GPU）的尖端处理器。大多数西方公司可以轻松获得它们。Llama 3 是社交媒体巨头 Meta 的最新型号，在美国芯片制造商 Nvidia 的 16,000 H100 GPU上进行了训练。Meta 计划在年底前再储备 600,000 辆。由 Elon Musk 支持的初创公司 XAI 在孟菲斯建立了一个由 100,000 H100 提供支持的数据中心。尽管另一家大型模型制造商 OpenAI 对其 GPU 储备守口如瓶，但 Nvidia 的老板黄仁勋在 4 月份亲手交付了最新的处理器。

This kind of access is a distant dream for most Chinese tech firms. Since October 2022 America has blocked the sale of high-performance processors to China. Some Chinese firms are rumoured to be turning to the black market to get their hands on these coveted chips. But the majority have shifted their focus to making the most of limited resources. Their results are giving Western firms food for thought.
对于大多数中国科技公司来说，这种访问是一个遥不可及的梦想。自 2022 年 10 月以来，美国已阻止向中国出售高性能处理器。有传言称，一些中国公司正在转向黑市以获得这些令人垂涎的芯片。但大多数公司已将重点转移到充分利用有限的资源上。他们的结果让西方公司深思熟虑。

Among the innovators is DeepSeek, a Chinese startup based in Hangzhou. Its latest model, DeepSeek-v2.5, launched in early September, holds its own against leading open-source models on coding challenges as well as tasks in both English and Chinese. These gains are not down to size: DeepSeek is said to have just over 10,000 of Nvidia’s older GPUs—a big number for a Chinese firm, but small by the standards of its American competitors.
创新者之一是总部位于杭州的中国初创公司 DeepSeek。其最新模型 DeepSeek-v2.5 于 9 月初推出，在编码挑战以及中英文任务方面与领先的开源模型相媲美。这些收益并不仅限于规模：据说 DeepSeek 拥有 10,000 多个 Nvidia 的旧 GPU——对于一家中国公司来说，这是一个很大的数字，但以美国竞争对手的标准来看却很小。

DeepSeek makes up for this shortfall in a number of ways. The first is that it consists of a number of different networks, each best suited to a different problem. This “mixture of experts” approach allows the model to delegate each task to the right network, improving speed and reducing processing time. Though DeepSeek has 236bn “parameters”—the virtual connections linking distinct bits of data—it uses less than a tenth at a time for each new chunk of information it processes. The model also compresses new data before they are processed. This helps it handle large inputs more efficiently.
DeepSeek 以多种方式弥补了这一不足。首先，它由许多不同的网络组成，每个网络最适合不同的问题。这种 “混合专家” 方法允许模型将每个任务委派给正确的网络，从而提高速度并缩短处理时间。尽管 DeepSeek 有 2360 亿个“参数”——链接不同数据位的虚拟连接——但它一次对处理的每个新信息块使用的参数不到十分之一。该模型还会在处理新数据之前对其进行压缩。这有助于它更有效地处理大型输入。

DeepSeek is not alone in finding creative solutions to a GPU shortage. MiniCPM, an open-source model developed by Tsinghua University and ModelBest, an AI startup, comes in varieties with 2.4bn and 1.2bn parameters, respectively. Despite its small size, MiniCPM’s performance on language-related tasks is comparable to large language models (LLMs) with between 7bn and 13bn parameters. Like DeepSeek’s model, it combines a mixture-of-experts approach with input compression. Like other small models with fewer parameters, however, MiniCPM may not be terribly high-perfoming in areas outside its specific field of training.
DeepSeek 并不是唯一一家为解决 GPU 短缺寻找创新解决方案的公司。MiniCPM 是由清华大学和 AI 初创公司 ModelBest 开发的开源模型，参数分别为 24 亿和 12 亿。尽管体积小，但 MiniCPM 在语言相关任务上的性能可与具有 70 亿到 130 亿个参数的大型语言模型（LLM美。与 DeepSeek 的模型一样，它将专家混合方法与输入压缩相结合。然而，与其他参数较少的小型模型一样，MiniCPM 在其特定训练领域之外的领域中可能不是非常高性能。

MiniCPM’s tiny size makes it well-suited for personal devices. In August its creators released a version of the model for mobile phones, which supports multiple languages and works with various types of data, from text and images to audio.
MiniCPM 的体积小巧，非常适合个人设备。8 月，它的创建者发布了该模型的一个版本，该版本支持多种语言并处理各种类型的数据，从文本和图像到音频。

Similar approaches are being tried elsewhere. FlashAttention-3, an algorithm developed by researchers from Together.ai, Meta and Nvidia, speeds up the training and running of LLMs by tailoring its design for Nvidia’s H100 GPUs. JEST, another algorithm released in July by Google DeepMind, is fed smaller quantities of high-quality data for its initial training before being let loose on larger, lower-quality data sets. The company claims this approach is 13 times faster and ten times more efficient than other methods. Researchers at Microsoft, which backs OpenAI, have also released a small language model called Phi-3 mini with around 4bn parameters.
其他地方正在尝试类似的方法。FlashAttention-3 是由 Together.ai、Meta 和 Nvidia 的研究人员开发的算法，它通过为 Nvidia 的 H 100 GPU 定制其设计来加快 LLM。JEST 是 Google DeepMind 于 7 月发布的另一种算法，在初始训练中输入少量的高质量数据，然后被释放到更大的、质量较低的数据集。该公司声称这种方法比其他方法快 13 倍，效率高 10 倍。支持 OpenAI 的 Microsoft 的研究人员还发布了一个名为 Phi-3 mini 的小型语言模型，该模型具有大约 40 亿个参数。

For Chinese firms, unlike those in the West, doing more with less is not optional. But this may be no bad thing. After all, says Nathan Benaich of Air Street Capital, an AI investment fund, “The scarcity mindset definitely incentivises efficiency gains.” ■
与西方公司不同，中国公司认为，事半功倍并非可有可无。但这可能不是坏事。毕竟，人工智能投资基金 Air Street Capital 的 Nathan Benaich 说，“稀缺心态肯定会激励效率提升。■

Curious about the world? To enjoy our mind-expanding science coverage, sign up to Simply Science, our weekly subscriber-only newsletter.
对世界感到好奇吗？要享受我们拓展思维的科学报道，请注册我们的每周订阅者专享时事通讯 Simply Science。

Explore more 探索更多

This article appeared in the Science & technology section of the print edition under the headline “Miniature model-building”
这篇文章出现在印刷版的科学与技术部分，标题为“微型模型构建”