The success of cheap Chinese models threatens America's lead—and poses a dilemma 廉价中国模型的成功威胁着美国的领先地位,并构成了困境
IF THERE IS a single technology America needs to bring about the “thrilling new era of national success” that President Donald Trump promised in his inauguration speech, it is generative artificial intelligence. At the very least, AI will add to the next decade’s productivity gains, fuelling economic growth. At the most, it will power humanity through a transformation comparable to the Industrial Revolution. 如果美国需要采取一项单一的技术来实现唐纳德·特朗普总统在就职演讲中承诺的“全国成功的新时代”,那就是生成的人工智能。至少,AI将增加未来十年的生产率提高,从而助长了经济增长。最多可以通过与工业革命相媲美的转型来为人类的启动。
Mr Trump’s hosting the next day of the launch of “the largest AI infrastructure project in history” shows he grasps the potential. But so does the rest of the world-and most of all, China. Even as Mr Trump was giving his inaugural oration, a Chinese firm released the latest impressive large language model (LLM). Suddenly, America’s lead over China in AI looks smaller than at any time since ChatGPT became famous (see Briefing). 特朗普先生主持了“历史上最大的AI基础设施项目”发起的第二天,这表明他掌握了潜力。但是世界上最重要的是中国。即使特朗普先生发表了就职演说,一家中国公司也发布了最新令人印象深刻的大型语言模式(LLM )。突然,自查尔格普(Chatgpt)著名以来,美国在AI中的中国领先地位比任何时候都要小(请参阅简报)。
China’s catch-up is startling because it had been so far be-hind-and because America had set out to slow it down. Joe Biden’s administration feared that advanced AI could secure the Chinese Communist Party (ССР) military supremacy. So America has curtailed exports to China of the best chips for training AI and cut off China’s access to many of the machines needed to make substitutes. Behind its protective wall, Silicon Valley has swaggered. Chinese researchers devour American papers on AI; Americans have rarely returned the compliment. 中国的追赶令人震惊,因为它已经到目前为止一直是局势 - 因为美国已经开始放慢脚步。乔·拜登(Joe Biden)的政府担心高级AI会确保中国共产党(fime)军事至高无上。因此,美国减少了向中国出口的最佳筹码,以训练AI,并切断了中国对替代品所需的许多机器的访问。硅谷在保护墙的后面大肆宣传。中国的研究人员吞噬了美国关于AI的论文;美国人很少回报称赞。
Yet China’s most recent progress is upending the industry and embarrassing American policymakers. The success of the Chinese models, combined with industry-wide changes, could turn the economics of AI on its head. America must prepare for a world in which Chinese AI is breathing down its neck. 然而,中国的最新进展是使该行业和美国决策者感到尴尬。中国模型的成功,再加上整个行业的变化,可能会使人工智能的经济学变成其脑海。美国必须为中国人工智会呼吸脖子的世界做准备。
China’s lLms are not the very best. But they are far cheaper to make. QwQ, owned by Alibaba, an e-commerce giant, was launched in November and is less than three months behind America’s top models. DeepSeek, whose creator was spun out of an investment firm, ranks seventh by one benchmark. It was apparently trained using 2,000 second-rate chips-versus 16,000 first-class chips for Meta’s model, which DeepSeek beats on some rankings. The cost of training an American LLM is tens of millions of dollars and rising. DeepSeek’s owner says it spent under $6m\$ 6 \mathrm{~m}. 中国的lLms不是最好的。但是它们便宜得多。 QWQ由一家电子商务巨头阿里巴巴(Alibaba)拥有,于11月成立,落后于美国顶级车型不到三个月。 DeepSeek的创造者从一家投资公司中脱颖而出,以一个基准排名第七。显然,它是使用2,000秒二进制芯片的16,000级一流筹码的培训,用于Meta的模型,DeepSeek在某些排名中均跳过。培训美国人的成本LLM是数千万美元的上涨。 DeepSeek的老板说它花在了 $6m\$ 6 \mathrm{~m} 。
American firms can copy DeepSeek’s techniques if they want to, because its model is open-source. But cheap training will change the industry at the same time as model design is evolving. China’s inauguration-day release was DeepSeek’s “reasoning” model, designed to compete with a state-of-theart offering by Openai (see Business section). These models talk to themselves before answering a query. This “thinking” produces a better answer, but it also uses more electricity. As the quality of output goes up, the costs mount. 如果愿意,美国公司可以复制DeepSeek的技术,因为其模型是开源的。但是,随着模型设计的发展,廉价的培训将同时改变行业。中国的就职日发布是DeepSeek的“推理”模型,旨在与OpenAI的最先进的产品竞争(请参阅商业部分)。这些模型在回答查询之前与自己交谈。这种“思维”产生了更好的答案,但也使用了更多的电力。随着产出质量的提高,成本支配。
The result is that, just as China has brought down the fixed cost of building models, so the marginal cost of querying them is going up. If those two trends continue, the economics of the tech industry would invert. In web search and social networking, replicating a giant incumbent like Google involved enormous fixed costs of investment and the capacity to bear huge 结果是,就像中国降低了建筑模型的固定成本一样,查询它们的边际成本也在上升。如果这两种趋势继续下去,技术行业的经济学将颠倒。在网络搜索和社交网络中,复制像Google这样的巨大的现任人,涉及巨大的固定成本和承担巨大的能力
losses. But the cost per search was infinitesimal. This-and the network effects inherent to many web technologies-made such markets winner-takes-all. 损失。但是每个搜索的成本是无限的。这是许多Web技术固有的网络效应,这些市场制造了这样的市场赢家。
If good-enough ai models can be trained relatively cheaply, then models will proliferate, especially as many countries are desperate to have their own. And a high cost-per-query may likewise encourage more built-for-purpose models that yield efficient, specialised answers with minimal querying. 如果可以相对便宜地对AI模型进行培训,那么模型将会扩散,尤其是许多国家迫切希望拥有自己的国家。而且,每次高价成本高昂也可能会鼓励更多的用途模型,这些模型以最少的查询产生高效,专业的答案。
The other consequence of China’s breakthrough is that America faces asymmetric competition. It is now clear that China will innovate around obstacles such as a lack of the best chips, whether by efficiency gains or by compensating for an absence of high-quality hardware with more quantity. China’s homegrown chips are getting better, including those designed by Huawei, a technology firm that a generation ago achieved widespread adoption of its telecoms equipment with a cheap-and-cheerful approach (see Culture section). 中国突破的另一个结果是,美国面临不对称的竞争。现在很明显,中国将围绕缺乏最好的芯片(无论是通过效率提高还是通过补偿数量更多的高质量硬件)来创新诸如缺乏最佳芯片的障碍。中国的本土筹码正在越来越好,包括由华为设计的技术公司,这是一家以前的技术公司,它以便宜的和精心的方法广泛采用了其电信设备(请参阅文化部分)。
If China stays close to the frontier, it could be the first to make the leap to superintelligence. Should that happen, it might gain more than just a military advantage. In a superintelligence scenario, winner-takes-all dynamics may suddenly reassert themselves. Even if the industry stays on today’s track, the widespread adoption of Chinese AI around the world could give the CCP enormous political influence, at least as worrying as the propaganda threat posed by TikTok, a Chinese-owned vid-eo-sharing app whose future in America remains unclear (see Business section). 如果中国靠近边境,那可能是第一个迈向超级智能的人。如果发生这种情况,它可能不仅会获得军事优势。在超级智能的情况下,赢家赢家可能会突然重新确定自己。即使该行业始终保持在今天的轨道上,世界各地的中国人工智能广泛采用也可能给CCP带来巨大的政治影响力,至少与Tiktok构成的宣传威胁一样令人担忧,Tiktok是中国人拥有的Vid-Eo-eo-sharing应用程序的未来在美国,尚不清楚(请参阅业务部分)。
What should Mr Trump do? His infrastructure announcement was a good start. America must clear legal obstacles to building data centres. It should also ensure that hiring foreign engineers is easy, and reform defence procurement to encourage the rapid adoption of AI. 特朗普先生应该做什么?他的基础设施宣布是一个良好的开始。美国必须清除建立数据中心的法律障碍。它还应该确保雇用外国工程师很容易,并改革国防采购以鼓励快速采用AI。
Some argue that he should also repeal the chip-industry export bans. The Biden administration conceded that the ban failed to contain Chinese AI. Yet that does not mean it accomplished nothing. In the worst case, AI could be as deadly as nuclear weapons. America would never ship its adversaries the components for nukes, even if they had other ways of getting them. Chinese AI would surely be stronger still if it now regained easy access to the very best chips. 有人认为他还应该废除筹码行业出口禁令。拜登政府承认该禁令未能包含中国人工智能。但这并不意味着它什么都没做。在最坏的情况下,AI可能与核武器一样致命。即使他们还有其他方法来获取核武器,美国也永远不会为对手提供核武器的组成部分。如果现在可以轻松访问最好的筹码,那么中国人工智会肯定会更强大。
Agencies or agency 机构或代理机构
More important is to pare back Mr Biden’s draft “AI diffusion rule”, which would govern which countries have access to American technology. This is designed to force other countries into America’s AI ecosystem, but the tech industry has argued that, by laying down red tape, it will do the opposite. With every Chinese advance, this objection becomes more credible. If America assumes that its technology is the only option for the likes of India or Indonesia, it risks overplaying its hand. Some tech whizzes promise the next innovation will once again put America far in front. Perhaps. But it would be dangerous to take America’s lead for granted. 更重要的是要削减Biden先生的“ AI扩散规则”草案,该草案将控制哪些国家可以使用美国技术。这旨在迫使其他国家进入美国的AI生态系统,但科技行业认为,通过放下繁文tape节,它将相反。随着每个中国的进步,这种异议变得更加可信。如果美国假设其技术是印度或印度尼西亚之类的唯一选择,那么它有可能使自己的手过多。一些科技鞭打保证下一个创新将再次使美国走得更远。也许。但是将美国的领导视为理所当然是危险的。
Briefing Chinese AI 简报中国人工智能
Uncomfortably close 不舒服地接近
China's artificial-intelligence industry has almost caught up with America's-on the cheap 中国的人工智慧产业几乎赶上了美国的便宜
THE WORLD’s first “reasoning model”, an advanced form of artificial intelligence, was released in September by OpenAI, an American firm. o1, as it is called, uses a “chain of thought” to answer difficult questions in science and mathematics, breaking down problems to their constituent steps and testing various approaches to the task behind the scenes before presenting a conclusion to the user. Its unveiling set off a race to copy this method. Google came up with a reasoning model called “Gemini Flash Thinking” in December. OpenAI responded with o3, an update of 01, a few days later. 美国公司(Openai)(一家美国公司)于9月发布了世界上第一个“推理模型”,一种先进的人工智能形式。被称为O1,使用“思想链”来回答科学和数学中的困难问题,将问题分解为其组成步骤,并在为用户提出结论之前对幕后的任务进行各种方法。它的揭幕开始了一场复制此方法的竞赛。 Google在12月提出了一种称为“双子闪光灯思维”的推理模型。 Openai以O3的响应,几天后,更新为01。
But Google, with all its resources, was not in fact the first firm to emulate OpenAI. Less than three months after o1 was launched, Alibaba, a Chinese e-commerce giant, released a new version of its Qwen chatbot, QwQ with the same “reasoning” capabilities. “What does it mean to think, to question, to understand?” the company asked in a florid blog post with a link to a 但是Google拥有所有资源,实际上并不是第一个模仿Openai的公司。 O1发射不到三个月后,中国电子商务巨头阿里巴巴(Alibaba)发布了其Qwen聊天机器人QWQ的新版本,具有相同的“推理”功能。 “思考,质疑,理解意味着什么?”该公司在佛罗里达博客文章中询问,并链接到
free-to-use version of the model. Another Chinese firm, DeepSeek, had released a “preview” of a reasoning model, dubbed R1, a week before that. Despite the American government’s efforts to hold back China’s AI industry, two Chinese firms had reduced their American counterparts’ technological lead to a matter of weeks. 该模型的免费版本。另一家中国公司DeepSeek发行了一周前一周的推理模型的“预览”,称为R1。尽管美国政府努力阻止了中国的AI产业,但两家中国公司将美国同行的技术领先地位降低到了数周之久。
It is not just with reasoning models that Chinese firms are in the vanguard: in December DeepSeek published a new large language model (LLM), a form of AI that analyses and generates text. v3 was almost 700 gigabytes, far too large to run on anything but specialist hardware, and had 685bn parameters, the individual precepts that combine to form the model’s neural network. That made it bigger than anything previously released for free download. Llama 3.1, the flagship LLM of Meta, the parent of Facebook, which was released in July, has only 405 bn parameters. 中国公司不仅具有推理模型,因此在Vanguard中:12月,DeepSeek发表了一种新的大型语言模型(LLM ),一种AI的形式,可以分析和生成文本。 V3几乎是700 GB,除了专业硬件以外的任何其他问题,并且有685亿次参数,这些参数是组合形成模型神经网络的个体戒律。这使它比以前发布的任何内容都要大。 Llama 3.1,旗舰LLM7月发布的Facebook的父母Meta只有405亿个参数。
DeepSeek’s LLM is not only bigger than many of its Western counterparts-it is DeepSeek的LLM不仅大于许多西方的对手
also better, matched only by the proprietary models at Google and OpenaI. Paul Gauthier, founder of Aider, an AI coding platform, ran the new DeepSeek model through his coding benchmark and found that it outclassed all its rivals except for o1 itself. Lmsys, a crowdsourced ranking of chatbots, puts it seventh, higher than any other open-source model and the highest produced by a firm other than Google or OpenAI (see chart on next page). 也更好,仅与Google和OpenAI的专有模型相匹配。 AI编码平台Aider的创始人Paul Gauthier通过他的编码基准运行了新的DeepSeek模型,发现除了O1本身,它超出了其所有竞争对手。 LMSYS是聊天机器人的众包排名,将其排名第七,高于其他任何开源模型,并且是Google或OpenAI以外的其他公司生产的最高模型(请参阅下一页上的图表)。
Enter the dragon 输入龙
Chinese AI is now so close in quality to its American rivals that the boss of Openai, Sam Altman, felt obliged to explain the narrowness of the gap. Shortly after DeepSeek released v3, he tweeted peevishly, “It is (relatively) easy to copy something that you know works. It is extremely hard to do something new, risky, and difficult when you don’t know if it will work.” 现在,中国人工智能与美国竞争对手的质量如此接近,以至于Openai的老板Sam Altman感到有义务解释差距的狭窄。 DeepSeek发行V3后不久,他发推文说:“(相对)复制您知道有效的东西。当您不知道它是否有效时,很难做一些新的,冒险和困难的事情。”
China’s AI industry had initially appeared second-rate. That may be in part 中国的人工智能产业最初是二流的。这可能部分是
because it has had to contend with American sanctions. In 2022 America banned the export of advanced chips to China. Nvidia, a leading chipmaker, has had to design special downgrades to its products for the Chinese market. America has also sought to prevent China from developing the capacity to manufacture top-of-theline chips at home, by banning exports of the necessary equipment and threatening penalties for non-American firms that might help, too. 因为它必须与美国的制裁抗衡。 2022年,美国禁止向中国出口高级筹码。领先的芯片制造商Nvidia不得不为中国市场设计特殊的降级。美国还试图通过禁止出口必要的设备出口并威胁到可能有帮助的非美国公司,以防止中国开发在家中生产托管顶级芯片的能力。
Another impediment is home-grown. Chinese firms came late to LLMs, in part owing to regulatory concerns. They worried about how censors would react to models that might “hallucinate” and provide incorrect information or-worsecome up with politically dangerous statements. Baidu, a search giant, had experimented with LLMs internally for years, and had created one called “ERNIE”, but was hesitant to release it to the public. Even when the success of ChatGPT prompted it to reconsider, it at first allowed access to ERNIEbot by invitation only. 另一个障碍是本土。中国公司迟到了LLMs,部分原因是监管问题。他们担心审查员会对可能“幻觉”并提供不正确的信息或对政治上危险的陈述提供不正确信息的模型的反应。贝杜(Baidu)是一个搜索巨头,已经尝试了LLMs在内部多年,并创建了一个叫做“ Ernie”的人,但犹豫不决地将其释放给公众。即使Chatgpt的成功促使其重新考虑,它首先仅允许通过邀请访问Erniebot。
Eventually the Chinese authorities issued regulations to foster the AI industry. Although they called on model-makers to emphasise sound content and to adhere to “socialist values”, they also pledged to “encourage innovative development of generative AI”. China sought to compete globally, says Vivian Toh, editor of TechTechChina, a news site. Alibaba was one of the first wave of companies to adapt to the new permissive environment, launching its own LLM, initially called Tongyi Qianwen and later abbreviated to “Qwen”. 最终,中国当局发布了法规,以促进AI行业。尽管他们呼吁模型制造商强调合理的内容并遵守“社会主义价值观”,但他们也承诺“鼓励生成AI的创新发展”。新闻网站Techtechchina的编辑Vivian Toh说,中国试图在全球范围内竞争。阿里巴巴是适应新宽松环境的公司第一波公司之一LLM最初被称为汤伊·昆文(Tongyi Qianwen),后来缩写为“ Qwen”。
For a year or so, what Alibaba produced was nothing to be excited about: a fairly undistinguished “fork” based on Meta’s open-source Llama LLM. But over the course of 2024, as Alibaba released successive iterations of Qwen, the quality began to improve. “These models seem to be competitive with very powerful models developed by leading labs in the West,” said Jack Clark of Anthropic, a Western AI lab, a year ago, when Alibaba released a version of Qwen that is capable of analysing images as well as text. 一年左右的时间,阿里巴巴的生产没有什么值得兴奋的:基于Meta的开源美洲驼,一个相当不明显的“叉子”LLM 。但是在2024年的过程中,随着阿里巴巴的连续迭代,质量开始提高。 “这些模型似乎与西方领先实验室开发的非常强大的模型具有竞争力,”一年前的西方人工智能实验室人Anthropic的杰克·克拉克(Jack Clark)说,当时阿里巴巴(Alibaba)发布了Qwen版本,该版本也能够分析图像。作为文字。
China’s other internet giants, including Tencent and Huawei, are building their own models. But DeepSeek has different origins. It did not even exist when Alibaba released the first Qwen model. It is descended from High-Flyer, a hedge fund set up in 2015 to use AI to gain an edge in share-trading. Conducting fundamental research helped High-Flyer become one of the biggest quant funds in the country. 中国的其他互联网巨头,包括腾讯和华为,正在建立自己的模型。但是DeepSeek的起源不同。当阿里巴巴发布第一个QWEN模型时,它甚至都不存在。它来自2015年建立的对冲基金High-Flyer,以利用AI来获得股票交易优势。进行基本研究有助于高级飞行成为该国最大的量化基金之一。
But the motivation wasn’t purely commercial, according to Liang Wenfeng, High-Flyer’s founder. The first backers of OpenAI weren’t looking for a return, he has observed; their motivation was to “pursue the mission”. The same month that Qwen 但是,根据高级飞行者的创始人Liang Wenfeng的说法,这种动机并不是纯粹的商业。他观察到,Openai的第一批支持者不是要回报。他们的动机是“追求任务”。 QWEN的同一个月
launched in 2023, High-Flyer announced that it, too, was entering the race to create human-level AI and span off its AI research unit as DeepSeek. High-Flyer成立于2023年,宣布,它也正在参加创建人级AI并跨越其AI研究部门DeepSeek的竞赛。
As Openai had before it, DeepSeek promised to develop aI for the public good. The company would make most of its training results public, Mr Liang said, to try to prevent the technology’s “monopolisation” by only a few individuals or firms. Unlike OpenaI, which was forced to seek private funding to cover the ballooning costs of training, DeepSeek has always had access to High-Flyer’s vast reserves of computing power. 正如Openai之前的那样,DeepSeek承诺为公共利益开发AI。 Liang先生说,该公司将公开其大部分培训结果,以试图阻止仅几个个人或公司的“垄断”。与被迫寻求私人资金来支付训练的速度成本的Openai不同,DeepSeek一直可以使用高飞行器的庞大计算能力储备。
DeepSeek’s gargantuan LLM is notable not just for its scale, but for the efficiency of its training, whereby the model is fed data from which it infers its parameters. This success derived not from a single, big innovation, says Nic Lane of Cambridge University, but from a series of marginal improvements. The training process, for instance, often used rounding to make calculations easier, but kept numbers precise when necessary. The server farm was reconfigured to let individual chips speak to each other more efficiently. And after the model had been trained, it was fine-tuned on output from DeepSeek R1, the reasoning system, learning how to mimic its quality at a lower cost. DeepSeek的gargantuanLLM不仅是因为其规模,还因为其培训的效率而值得注意的是,该模型是从中馈出其参数的数据。剑桥大学的尼克·莱恩(Nic Lane)说,这种成功不是源于单一的大创新,而是源于一系列边际改进。例如,训练过程通常会使用舍入来使计算更容易,但必要时保持数字精确。对服务器农场进行了重新配置,以使单个芯片更有效地互相交流。在培训模型之后,推理系统DeepSeek R1的产出进行了微调,学习如何以较低的成本模仿其质量。
Thanks to these and other innovations, coming up with v3’s billions of parameters took fewer than 3 m chip-hours, at an estimated cost of less than $6m\$ 6 \mathrm{~m}-about a tenth of the computing power and expense that went into Llama 3.1. v3’s training required just 2,000 chips, whereas Llama 3.1 used 16,000 . And because of America’s sanctions, the chips v3 used weren’t even the most powerful ones. Western firms seem ever more profligate with chips: Meta plans to build a server farm using 350,000 of them. Like Ginger Rogers dancing backwards and in high heels, DeepSeek, says Andrej Karpathy, former head of AI at Tesla, has made it “look easy” to train a frontier model “on a joke of a budget”. 多亏了这些创新和其他创新,提出V3的数十亿个参数的时间不到3 m,估计成本低于 $6m\$ 6 \mathrm{~m} - 大约是遍历Llama 3.1的计算能力和费用的十分之一。 V3的培训仅需要2,000芯片,而Llama 3.1使用16,000。由于美国的制裁,筹码V3甚至不是最强大的制裁。西方公司似乎越来越详尽地使用芯片:Meta计划使用350,000个公司建造服务器农场。像姜·罗杰斯(Ginger Rogers)向后跳舞和高跟鞋一样,deepseek说,特斯拉AI前负责人安德烈·卡尔帕蒂(Andrej Karpathy)说,“看起来很容易”“看起来很容易”“在预算的笑话中训练边境模型”。
Not only was the model trained on the 不仅在
cheap, running it costs less as well. DeepSeek splits tasks over multiple chips more efficiently than its peers and begins the next step of a process before the previous one is finished. This allows it to keep chips working at full capacity with little redundancy. As a result, in February, when DeepSeek starts to let other firms create services that make use of v 3 , it will charge less than a tenth of what Anthropic does for use of Claude, its LLM. “If the models are indeed of equivalent quality this is a dramatic new twist in the ongoing LLM pricing wars,” says Simon Willison, an AI expert. 便宜,运行它的成本也降低。 DeepSeek比同行更有效地将任务分配到多个芯片上,并在上一个过程完成之前开始了下一步。这使其可以使芯片充满全部功能,几乎没有冗余。结果,在2月,当DeepSeek开始让其他公司创建使用V 3的服务时,它将收取不到人类使用Claude的十分之一。LLM 。 “如果模型确实具有同等的质量LLMAI专家西蒙·威利森(Simon Willison)说。
DeepSeek’s quest for efficiency has not stopped there. This week, even as it published R1 in full, it also released a set of smaller, cheaper and faster “distilled” variants, which are almost as powerful as the bigger model. That mimicked similar releases from Alibaba and Meta and proved yet again that it could compete with the biggest names in the business. DeepSeek对效率的追求并没有停止。本周,即使它完整发布了R1,它也发布了一组较小,更便宜,更快的“蒸馏”变体,它们几乎与较大的模型一样强大。这模仿了阿里巴巴和梅塔的类似发行,并再次证明了它可以与业务中的知名人士竞争。
The way of the dragon 龙的方式
Alibaba and DeepSeek challenge the most advanced Western labs in another way, too. Unlike Openai and Google, the Chinese labs follow Meta’s lead and make their systems available under an open-source licence. If you want to download a Qwen AI and build your own programming on top of it, you can-no specific permission is necessary. This permissiveness is matched by a remarkable openness: the two companies publish papers whenever they release new models that provide a wealth of detail on the techniques used to improve their performance. 阿里巴巴和Deepseek也以另一种方式挑战了最先进的西方实验室。与OpenAI和Google不同,中国实验室遵循Meta的铅,并在开源许可下提供其系统。如果您想下载QWEN AI并在其上构建自己的编程,则不需要具体的权限。这种允许性与一个非凡的开放性相匹配:两家公司在发布新模型时发表论文,以提供有关改善其性能的技术的丰富细节。
When Alibaba released QwQ, standing for “Questions with Qwen”, it became the first firm in the world to publish such a model under an open licence, letting anyone download the full 20-gigabyte file and run it on their own systems or pull it apart to see how it works. That is a markedly different approach from OpenAI, which keeps o1’s internal workings hidden. 当阿里巴巴发布QWQ(代表“与Qwen的问题”)时,它成为了世界上第一个在开放许可下发布这种模型的公司除了查看其工作原理。这与Openai明显不同,这使O1的内部工作隐藏了。
In broad strokes, both models apply what is known as “test-time compute”: in- 在广泛的笔触中,这两个模型都应用了所谓的“测试时间计算”:in-
Near the top of the class 在班级的顶部附近
Selected large language models’ performance against different benchmarks, January 2025 选择大型语言模型针对不同基准的性能,2025年1月