这是用户在 2025-6-26 14:09 为 https://app.immersivetranslate.com/pdf-pro/fe8f9c48-96e4-404c-a5df-96bbbcd5f877/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

250620 【全球商业观察】第 25 周

导读:

1.每周推荐一本书——《智能简史》,从智慧的六次飞跃看 AGI 的演进。
2. AI 的一叶知秋——广告业给 AI 应用带来的三大启示: AI 替代普通人是大概率事件,但替代不了大师;AI 会强化商场中的马太效应;AI 会带来意想不到的结果。
3.小扎的 Acquirehire,造就最年轻的亿万富翁 Alexandr Wang。收购 Scale AI 就是为了将 Alexandr Wang 收归麾下,奥特曼甚至爆料说为了从 Open AI 挖角,小扎开出了一亿美元的签约奖金,这些数字都是史无前例的。小扎此前最著名的两笔收购分别是 200 亿美元收购 WhatsAPP 和 10 亿美元收购 Instagram,AI 的军备竞赛更是水涨船高,做实此前说的"马太效应",强者恒强。
4.欧美大学生就业情况如何?答案是比较差,学历带来的薪资溢价在缩水,高科技平台也在裁员,更不用说 AI 正在替代大量入门级的知识工作。
5.AI 测试也水涨船高,不然怎么能凸显出模型在不断进步?
6.中国大学是不是全球最棒?至少在 Nature 看来它们在科研上有长足进步,中科院、清华、北大和浙大全球排名前十,浙大几个月之前被誉为中国的 MIT,倒真不是虚名。

1、每周推荐一本书——《智能简史》

《智能简史》是一本智能/大脑的进化史。尤其当我们对照过去几十年(以及在过去十五年加速的)人工智能的发展史会发现通过理解大脑来规划 AI 的发展,远比普通人想象地复杂地多。更有趣的是,AI发展中的不少假设,竟然也是攉动我们理解大脑和智能的钥匙。

逐渐复杂学习来源的进化——智能的六次突破

从动物向人类进化的过程中,智能有五次突破,最近的一次是十万年前语言和文字的出现。我们是否贴近第六次突破,也就是从碳基向硅基的转变?AGI 的发展能从智能的进
化中学到什么?
1、转向:通过区分外界刺激的好坏,从而趋利避害地进行导航。多巴胺和血清素,fight or flight(战斗还是逃跑),最早期的情感模版。
2、强化,从自己实际行为中学习,试错(用生命试错,但进化会保留合适的基因)。通过学习来重复自身经验的历史中带来正面价值的行为,并抑制带来负面价值的行为,在这一过程中对时间的感知很关键。我们用奖赏来训练宠物也是强化学习的一种。对应 AI 是无模型强化学习。
3、模拟,从自己的想象行为中学习,替代性试错,付出生命的代价要小很多。新皮质的出现带来了全新的学习系统,通过想象来学习让动物可以重新演绎过去的事件,即情景记忆,并思考过去时间的不同可能性(反事实学习),形成规划能力。
4、心智,从他人的实际行为中学习,模仿学习很重要,建立自己的思维模型。
5、语言,从他人的想象行为中学习,从他人的情景记忆、内部模拟、反事实想象中学习,让思想可以跨代累积。(生命式和命令式标签的不同)
6、从 DNA 抵抗熵增,核心是存在于信息而不是物质之中。第六次突破会是超级智能,硅基智能的出现,实现智能载体从生物媒介向数字媒介的转变。智能摆脱人类大脑的限制——认知能力受到神经元处理速度、人体热量和大脑能在碳基生命形式中达到的最大尺寸等因素的严重限制。其特点是根据需要无限扩大处理能力(算力),以及随着 AI 能够自由复制和重新配置自身,个体性讲失去其明确的界限(海量数据的处理和存储),遵循更存粹的进化原则,即变异和选择原则。
如果从进化的视角来审视 AGI 的发展,它的强项是从人类的文字中学习了太多,但仍然还没有跨越第三步(模拟)和第四步骤(构建自己的心智模型)。
林标泥 详
A BRIEF HISTORY OF INTELLIGENCE
智能简史
中国出顺焦園

Computers v creatives  计算机与创意人的对决

2、What the "cockroaches" of the ad world teach about dealing with AIA rosé-soaked meeting in Cannes is like a postcard from the future
2、广告界的"小强"教会我们如何应对 AI——戛纳一场玫瑰香槟的会议宛如来自未来的明信片

WHEN ADVERTISING executives describe themselves as cockroaches, they are not being self-deprecating. Admen have shown a remarkable ability to survive what look like extinction-level events. Copywriters adapted to radio in the 1920s; artists embraced TV in the 1950s. Agencies clung on in the early 2000s as ads moved online. This week, in the face of another technological revolution-, the admen steadfastly held their annual jamboree on the French Riviera.
当广告高管们自比为蟑螂时,这并非自谦之词。广告人展现出惊人的生存能力,总能挺过看似灭绝级的危机。上世纪 20 年代文案撰稿人适应了广播,50 年代美术指导拥抱了电视,千禧年初广告业向线上迁移时代理商们依然坚挺。本周面对又一场技术革命,广告人们依然雷打不动地在法国里维埃拉举行年度盛会。
The latest upheaval, brought by artificial intelligence (AI), is testing the cockroaches as never before. Advertising is one of the sectors most radically affected by AI so far. As such, adland offers a postcard from the future for other industries. Three lessons stand out.
人工智能(AI)带来的最新变革正以前所未有的方式考验着行业的"蟑螂理论"。广告业是目前受 AI 冲击最彻底的领域之一,堪称其他行业窥见未来的明信片。其中有三大启示尤为突出。
The first is that the moat between human workers and chatbot rivals is narrower than most people think. Creative work is often seen as immune from automation. Large language models (LLMs) are designed to predict the most likely answer, which is often the opposite of the most original one. The best ads remain too weird and wonderful for any machine to have dreamt up: consider the campaign that attached step-counters to chickens to advertise free-range eggs.
首先,人类工作者与聊天机器人对手之间的护城河比多数人想象的更窄。创意工作常被视为自动化免疫区,但大型语言模型(LLMs)的设计原理是预测最可能的答案——这往往与最具原创性的答案背道而驰。最出色的广告因其怪诞绝妙仍非机器所能构想:比如给鸡绑上计步器来宣传散养鸡蛋的案例。
Yet this week in Cannes TikTok, Meta, Google and other ad platforms showed off Al-powered features that can create passable video or rewrite ad copy at the click of a button. Their output will not win any awards. That does not matter. Most of the $ 1 $ 1 $1\$ 1 trn that is spent on ads each year goes towards workmanlike
然而本周在戛纳,TikTok、Meta、谷歌等广告平台纷纷展示 AI 功能,点击按钮即可生成合格视频或重写广告文案。这些产出虽无缘奖项,但无关紧要——每年数万亿美元的广告支出中,绝大部分流向的正是这类匠气之作。

campaigns, rather than Cannes trophy-bait. Sam Altman’s prediction that AI will one day be able to do 95 % 95 % 95%95 \% of marketing may sound like boosterism for his firm, OpenAI. But the inspired human-made content that people present as a counter-argument is firmly within the remaining 5 % 5 % 5%5 \%. Robots will content themselves with the rest.
营销活动,而非戛纳奖杯的诱饵。萨姆·奥特曼预测人工智能终将能完成 95 % 95 % 95%95 \% 的营销工作,这听起来或许像在为其公司 OpenAI 造势。但人们作为反驳论据提出的那些充满灵性的人创内容,恰恰属于剩余的 5 % 5 % 5%5 \% 部分。机器人将满足于处理其余工作。
Another lesson is that the biggest companies have the most to gain. This runs counter to a popular narrative, that AI will democratise skills and intelligence. It is true that the new tools from Meta and co will allow millions of micro-businesses to produce video ads of a quality that was once out of their reach, and translate text into several languages. Global campaigns can now be launched online for hundreds of dollars; TV-worthy commercials are being put together for a few thousand.
另一个启示是,规模越大的企业获益越多。这与"AI 将普及技能与智慧"的主流论调背道而驰。诚然,Meta 等公司的新工具能让数百万小微企业制作出曾经难以企及的高质量视频广告,并实现多语言文本翻译。如今只需数百美元就能发起全球线上营销活动,花费几千美元就能制作出电视级广告片。
But take a step back and it is clear that the serious money is being made by the giants. The selling of ads was already becoming more concentrated: four tech firms that accounted for a third of the global ad market five years ago now account for half of it. And America’s biggest companies are ramping up their Al investment at a faster rate than the rest. No wonder: AI requires computing muscle and large data sets, both of which are expensive. Whereas human intelligence is more or less randomly distributed, the artificial kind can be bought. Rather than democratise access to intelligence, AI may allow the richest to hoard it.
但退一步看,显然巨头们才是真正赚大钱的一方。广告销售本就日趋集中:五年前占据全球广告市场三分之一份额的四家科技公司,如今已占据半壁江山。而美国最大型企业正以远超同行的速度加码人工智能投资。这并不意外:人工智能需要强大算力和海量数据集,二者都价格不菲。如果说人类智力多少是随机分布的,那么人工智能则可以用金钱购买。与其说 AI 实现了智力平权,不如说它可能让最富有的群体独占智慧资源。
The last lesson from adland is that Al’s spread will have unpredictable consequences. Some advertisers are shifting their budgets from TV to the humble outdoor billboard. Why? In part because AI has made it possible to infer from vast data sets whether consumers who saw the ad bought the product, allowing marketers to measure the campaign’s effectiveness rather than guess at it. Another unexpected winner is old-school public relations. As consumers switch from search-engines to chatbots, brands need to persuade LLMs to speak highly of them. The most effective way to do that is to influence the sources that the model pays most attention to, such as news articles. In the Al age, high-tech “search-engine optimisation” may be less effective than offline schmoozing (or so, at least, marketers can insist when presenting their
广告业带来的最后一个启示是,人工智能的普及将产生难以预料的后果。部分广告主正将预算从电视转向不起眼的户外广告牌。原因何在?部分在于人工智能能通过海量数据集推断出看过广告的消费者是否购买了商品,使营销人员能精准衡量广告效果而非凭空猜测。另一个意外赢家是传统公关行业。随着消费者从搜索引擎转向聊天机器人,品牌需要说服 LLMs 为其美言。最有效的方式是影响模型最关注的信息源——例如新闻报道。在人工智能时代,高科技的"搜索引擎优化"可能还不如线下公关活动有效(至少营销人员在提案时可以这么坚持)。

post-Cannes expenses claims).
戛纳电影节后的报销单)。
Adland is an outlier in important ways. Ad spending is highly cyclical, so the industry has benefited more than most from the Al-fuelled boom of recent years. The big tech firms that are active in ads also happen to be leaders in AI, and have used ads to test their newest products. And not everyone has the admen’s knack for survival. But the rest of the business world should pay attention to the cockroaches of Cannes. The revolution in adland is a taste of what is to come.
广告业在许多重要方面都显得与众不同。广告支出具有极强的周期性,因此该行业从近年来人工智能驱动的繁荣中获益远超多数行业。活跃于广告领域的大型科技公司恰巧也是人工智能领域的领军者,它们利用广告来测试最新产品。并非所有人都具备广告人那种生存智慧。但商界其他领域都该关注这些戛纳的"小强"。广告业的革命正是未来变革的预演。

Zucked in  扎克伯格豪赌 AI

3、Mark Zuckerberg is spending megabucks on an Al hiring spreeMeta engages in a high-stakes battle to unseat OpenAI
3、马克·扎克伯格正斥巨资大举招聘 AI 人才 Meta 与 OpenAI 展开高风险对决

WHEN MARK ZUCKERBERG decided to launch his quest for the metaverse in 2021, he threw fistfuls of cash at the effort. Meta’s boss is now repeating the act, this time with generative artificial intelligence (AI). Hot on the heels of what may be the world’s most expensive acquihire-a $ 14.3 bn $ 14.3 bn $14.3bn\$ 14.3 \mathrm{bn} deal to buy 49 % 49 % 49%49 \% of Scale AI, a data-labelling firm whose main asset is Alexandr Wang, its 28-year-old founder-people close to the matter say Mr Zuckerberg is planning to offer more than $1bn combined for two of Silicon Valley’s hottest AI brain boxes, who would work under Mr Wang. It marks the start of a reset of Meta’s generative-AI ambitions.
当马克·扎克伯格在 2021 年决定进军元宇宙时,他为此投入了大笔资金。如今这位 Meta 掌门人正重演豪掷千金的戏码,这次押注的是生成式人工智能。在刚完成可能是全球最昂贵的"收购式招聘"——收购数据标注公司 Scale AI(其核心资产是 28 岁的创始人王亚历山德)之后,知情人士透露扎克伯格计划以超 10 亿美元总价招揽硅谷两位顶尖 AI 人才,他们将归入王亚历山德麾下。这标志着 Meta 对生成式 AI 战略的重新调整。
Meta has made no comment, but if the deal goes through Nat Friedman and Daniel Gross, entrepreneurs and partners in a venture-capital (VC) firm called NFDG, will work in Meta’s “superintelligence” unit under Mr Wang, one of America’s youngest self-made billionaires. The word “superintelligence” is somewhat misleading. Rather than ground-breaking AI research, the team is
Meta 尚未置评,但若交易达成,风险投资公司 NFDG 的合伙人纳特·弗里德曼与丹尼尔·格罗斯将加入王亚历山德领导的"超级智能"部门。这位美国最年轻白手起家亿万富翁领衔的"超级智能"团队名不副实——他们并非从事突破性 AI 研究,

expected to focus on developing new AI products for Meta, some of whose recent efforts, including its latest Llama model and the Meta AI chatbot, have disappointed.
而是专注于为 Meta 开发新型 AI 产品。该公司近期推出的 Llama 大模型和 Meta AI 聊天机器人等产品均未达预期。
Someone who knows all three men calls the trio “the avengers”. He reckons they will have huge additional sums at their disposal to hire top AI researchers in order to unseat OpenAI, maker of ChatGPT, as the dominant generative-AI firm. “They’re going to go big,” he says.
一位熟悉这三人的知情人士称他们为"复仇者联盟"。他认为三人将掌握巨额资金用于招募顶尖 AI 研究员,旨在颠覆 ChatGPT 开发商 OpenAI 在生成式 AI 领域的主导地位。"他们要干一票大的,"该人士表示。
Indeed, there appear to be few limits on what Meta is prepared to spend. On June 17th Sam Altman, the boss of OpenAI, said on a podcast that Meta was offering signing bonuses of $ 100 m $ 100 m $100m\$ 100 \mathrm{~m} to poach his staff. Another person close to the situation says Mr Zuckerberg sought to hire Ilya Sutskever, the brains behind ChatGPT and co-founder of Safe Superintelligence (SSI), another hot Al startup, to work at Meta, though he was unsuccessful. “He is throwing insane amounts of money at people,” the person said.
Meta 的投入似乎确实没有上限。6 月 17 日,OpenAI 首席执行官萨姆·奥尔特曼在播客中透露,Meta 为挖角其员工开出了 $ 100 m $ 100 m $100m\$ 100 \mathrm{~m} 美元的签约奖金。另一位接近内情的消息人士称,扎克伯格曾试图招揽 ChatGPT 核心开发者、新锐 AI 公司 Safe Superintelligence(SSI)联合创始人伊利亚·苏茨克沃加入 Meta,但未能成功。"他正在向人才疯狂砸钱,"该人士表示。
The gambit shows Mr Zuckerberg’s continued willingness to make mighty, long-term bets to reinvent his firm, even if his foray into the metaverse has been a costly flop. “This is very Zuckerbergian to do these big, loud stunts just to prove how committed he is,” says Eric Seufert, an independent tech analyst. And while the sums are big, they may not be as reckless as some pundits argued when reports surfaced that Meta was buying its stake in Scale AI, considering how much of Meta’s $ 1.7 $ 1.7 $1.7\$ 1.7 trn market value is riding on its success in AI. The acquisitions also involve people with close personal ties and shared ideals.
扎克伯格的这一策略显示出他仍愿意进行大胆的长期押注来重塑公司,尽管他对元宇宙的探索已成为代价高昂的失败。独立科技分析师埃里克·瑟弗特表示:"这种大张旗鼓的举动非常符合扎克伯格的风格,就是为了证明他的决心。"虽然涉及金额巨大,但考虑到 Meta 高达 $ 1.7 $ 1.7 $1.7\$ 1.7 万亿美元的市值正押注于其在人工智能领域的成功,这些收购可能并不像某些专家在报道 Meta 收购 Scale AI 股权时所声称的那样鲁莽。这些收购还涉及与扎克伯格有密切私人关系和共同理念的人士。
Mr Friedman, former boss of GitHub, a software-development platform owned by Microsoft, is friends with Mr Zuckerberg. He is part of Meta’s Advisory Group, which provides guidance to the company. And, like Mr Zuckerberg, he is a lover of ancient Rome. He and Mr Gross helped launch a contest called the Vesuvius Challenge to decode scrolls buried in Herculaneum after Mount Vesuvius erupted in 79AD.
微软旗下软件开发平台 GitHub 的前老板弗里德曼与扎克伯格私交甚笃。作为 Meta 顾问团成员,他为公司提供战略指导。与扎克伯格一样,他也是古罗马文化的爱好者。他与格罗斯共同发起了"维苏威挑战赛",旨在破译公元 79 年维苏威火山爆发后掩埋在赫库兰尼姆的古卷轴。
Mr Friedman and Mr Gross are savvy AI investors. Some call their VC firm the Al equivalent of Andreessen Horowitz, a Silicon Valley juggernaut born out of he dotcom boom. Mr Friedman invested in Scale AI and is close to Mr Wang. Mr Gross is a co-founder of Mr Sutskever’s SSI, which was recently valued at $32bn less than a year after its birth. It is not clear what will happen to NFDG.
弗里德曼和格罗斯是精明的 AI 投资人。有人称他们的风投公司是安德森·霍洛维茨的 AI 版——后者是互联网泡沫时期崛起的硅谷巨擘。弗里德曼投资了 Scale AI,与王先生关系密切。格罗斯则是苏茨克沃 SSI 公司的联合创始人,这家成立不足一年的公司近期估值已达 320 亿美元。NFDG 将何去何从尚不明朗。
People who know the two say that joining Meta appeals not only for the generous terms, but also the excitement of working for an AI heavyweight and the money and computing power it will put at their disposal. “This is the tech battle of our time,” says one person close to the pair. Mr Zuckerberg intends to win. ■"
知情人士表示,两人选择加入 Meta 不仅因为优厚条件,更源于为 AI 巨头效力的激情,以及可支配的雄厚资金与算力。"这是我们这个时代的技术战役",一位接近二人的消息人士称。扎克伯格志在必得。■

Crammed and damned  拥挤与困局

4 、Why today's graduates are screwed The bottom has fallen out of the job market
4、为何本届毕业生陷入困境 就业市场已跌入谷底

PITY THE ambitious youngster. For decades the path to a nice life was clear: go to university, find a graduate job, then watch the money come in. Today’s hard-working young, however, seem to have fewer options than before.
为那些怀揣抱负的年轻人感到惋惜。数十年来,通往优渥生活的路径清晰可循:上大学、找份体面工作,然后坐享收入。然而如今勤奋的年轻人,似乎面临着比从前更少的选择。
Go into tech? The big firms are cutting jobs. How about the public sector? Less prestigious than it used to be. Become an engineer? Lots of innovation, from electric vehicles to renewable energy, now happens in China. A lawyer? Artificial intelligence will soon take your job. Don’t even think about becoming a journalist.
投身科技行业?巨头们正在裁员。公共部门如何?声望已大不如前。当工程师?从电动汽车到可再生能源的大量创新正发生在中国。做律师?人工智能很快会抢走你的饭碗。至于新闻记者,更是想都别想。
Across the West, young graduates are losing their privileged position; in some cases, they have already lost it. Jobs data hint at the change. Matthew Martin of Oxford Economics, a consultancy, has looked at Americans aged 22 to 27 with a bachelor’s degree or more. For the first time in history, their unemployment rate is now consistently higher than the national average.
在整个西方世界,年轻毕业生正逐渐丧失其特权地位;在某些领域,这种优势已然消失。就业数据揭示了这一变化。咨询公司牛津经济研究院的马修·马丁研究了 22 至 27 岁拥有学士及以上学位的美国人群。他们的失业率如今持续高于全国平均水平,这在美国历史上尚属首次。
Recent graduates’ rising unemployment is driven by those who are looking for work for the first time.
新近毕业生失业率上升的主要推手,是那些首次求职的年轻人。
The trend is not just apparent in America. Across the European Union the unemployment rate of young folk with tertiary education is approaching the overall rate for the age group (see chart 1). Britain, Canada, Japan-all appear to be on a similar path. Even elite youngsters, such as MBA graduates, are suffering. In 2024, 80% of Stanford’s business-school graduates had a job three months after leaving, down from 91 % 91 % 91%91 \% in 2021. At first glance, the students eating al fresco at the school’s cafeteria look happy. Look again, and you can see the fear in their eyes.
这一趋势不仅在美国显现。在整个欧盟,受过高等教育的年轻人失业率正接近该年龄段的整体水平(见图表 1)。英国、加拿大、日本似乎都呈现出相似态势。即便是精英阶层的年轻人,比如 MBA 毕业生,也未能幸免。2024 年,斯坦福大学商学院毕业生离校三个月内的就业率为 80%,较 2021 年的 91 % 91 % 91%91 \% 有所下降。乍看之下,在学校露天餐厅用餐的学生们似乎心情愉悦,但细察之下,他们眼中难掩忧虑。
Until recently the “university wage premium”, where graduates earn more than others, was growing (see chart 2). More recently, though, it has shrunk, including in America, Britain and Canada. Using data on young Americans from the New York branch of the Federal Reserve, we estimate that in 2015 the median college graduate earned 69 % 69 % 69%69 \% more than the median high-school graduate. By last year, the premium had shrunk to 50 % 50 % 50%50 \%.
直到不久前,"大学薪资溢价"(即毕业生收入高于其他群体)还在持续增长(见图表 2)。然而近年来,包括美国、英国和加拿大在内的国家,这一溢价已开始缩水。根据纽约联邦储备银行提供的美国年轻人数据,我们估算 2015 年大学毕业生中位数收入比高中毕业生高出 69 % 69 % 69%69 \% 。而到去年,这一溢价已缩减至 50 % 50 % 50%50 \%

Town v gown  城镇与学袍之争

图:欧美大学毕业生失业率高于普通年轻人失业率,大学学历的含金量也在贬值。
Jobs are also less fulfilling. A large survey suggests that America’s “graduate satisfaction gap”-how much more likely graduates are to say they are “very satisfied” with their job than non-graduates-is now around three percentage points, down from a long-run advantage of seven.
工作带来的满足感也在降低。一项大型调查显示,美国的"毕业生满意度差距"——即毕业生自称对工作"非常满意"的比例比非毕业生高出多少——现已降至约 3 个百分点,远低于长期保持的 7 个百分点优势。
Is it a bad thing if graduates lose their privileges? Ethically, not really. No group has a right to outperform the average. But practically, it might be. History shows that when brainy people-or people who think they are brainy-do worse than they think they ought to, bad things happen.
毕业生失去特权是坏事吗?从道德层面看并非如此。任何群体都无权凌驾于平均水平之上。但从现实角度看,这可能是个问题。历史表明,当聪明人——或自认为聪明的人——表现不如预期时,往往会导致恶果。
Peter Turchin, a scientist at the University of Connecticut, argues that “elite overproduction” has been the proximate cause of all sorts of unrest over the centuries, with “counter-elites” leading the charge. Historians identify “the problem of an excess of educated men” as contributing to Europe’s revolutions of 1848, for instance. Luigi Mangione would be a member of the counter-elite. Mr Mangione, a University of Pennsylvania graduate, should be living a prosperous life. Instead, he is on trial for the alleged murder of the chief executive of a health insurer. More telling is the degree to which people sympathise with his alienation: Mr Mangione has received donations of well over $1m.
康涅狄格大学科学家彼得·图尔钦提出,"精英过剩"是几个世纪以来各类动荡的直接诱因,而"反精英阶层"往往充当先锋。历史学家指出,欧洲 1848 年革命就与"受教育阶层过剩问题"有关。路易吉·曼吉奥内正是这类反精英阶层的典型代表——这位宾夕法尼亚大学毕业生本应过着优渥生活,却因涉嫌谋杀某医疗保险公司首席执行官而受审。更具警示意义的是民众对其边缘化处境的共情程度:曼吉奥内已获得超百万美元捐款。
Why are graduates losing their privileges? Maybe the enormous expansion of universities lowered standards. If ivory towers admit less-talented applicants, and then do a worse job of teaching them, employers might over time expect fewer differences between the average graduate and the average non-graduate. A recent study, by Susan Carlson of Pittsburg State University and colleagues, suggests that many students today are functionally illiterate. A worrying number of English majors struggle to understand Charles Dickens’s “Bleak House”. Many are bamboozled by the opening line: “Michaelmas term lately over, and the Lord Chancellor sitting in Lincoln’s Inn Hall.”
为何毕业生不再享有优势?或许大学的大规模扩招降低了教育标准。如果象牙塔招收资质平平的学生,又未能提供优质教学,久而久之,雇主们对普通毕业生与非毕业生的能力差异预期自然会缩小。匹兹堡州立大学苏珊·卡尔森与同事的最新研究显示,当今许多大学生实际处于功能性文盲状态——令人担忧的是,连英语专业学生都难以理解狄更斯的《荒凉山庄》,不少人被小说开篇"米迦勒节期刚过,大法官阁下正端坐在林肯法学院大厅"这句话弄得晕头转向。
Certainly some universities do offer rubbish courses to candidates who should not be there. On the other hand, there is little correlation between the number of graduates and the wage premium over the long term: both grew in America
确实有些大学为本不该入学的学生开设了低质课程。但长期来看,毕业生数量与薪资溢价之间几乎不存在相关性:在美国,这两个数据都呈现增长态势

in the 1980s, for instance. Moreover, talk to students at most universities, especially elite ones, and you will be disabused of the notion that they are stupid. Those at Stanford are ferociously intelligent. Many at Oxford and Cambridge once lounged around, and even celebrated a “gentleman’s third”, if they were so honoured. No longer.
以 20 世纪 80 年代为例。此外,若与多数大学(尤其是顶尖学府)的学生交谈,你便会打消他们愚钝的刻板印象——斯坦福学子才思敏捷,牛津剑桥的学子虽曾以闲散度日、甚至以"绅士三等学位"为荣,但这类现象早已不复存在。
A new paper by Leila Bengali of the San Francisco branch of the Fed, and colleagues, is another reason to question the graduates-are-thick explanation. They find that the change in the university wage premium mainly “reflects demand factors, specifically a slowdown in the pace of skill-biased technological change”. In plain English, employers can increasingly get non-graduates to do jobs that were previously the preserve of graduates alone.
旧金山联邦储备银行研究员莱拉·孟加拉与同事的最新论文,为质疑"毕业生能力下降论"提供了新依据。研究发现,大学薪资溢价的变化主要"反映需求端因素,尤其是技能偏向型技术进步速度的放缓"。简言之,雇主越来越能用非大学毕业生填补原本专属大学毕业生的岗位。

First-class? Nobody cares
一等学位?无人问津

This is especially true for those jobs that require the rudimentary use of technology. Until relatively recently, many people could get to grips with a computer only by attending a university. Now everyone has a smartphone, meaning non-graduates are adept with tech, too. The consequences are clear. In almost every sector of the economy, educational requirements are becoming less strenuous, according to Indeed, a jobs website. America’s professional-and-business services industry employs more people without a university education than it did 15 years ago, even though there are fewer such people around.
对于那些仅需基础技术操作的工作而言尤为如此。直到不久之前,许多人还只能通过上大学才能掌握电脑操作。如今人人都拥有智能手机,这意味着非大学毕业生同样精通技术。其影响显而易见。招聘网站 Indeed 数据显示,几乎每个经济领域的学历要求都在降低。美国专业与商业服务行业雇佣的非大学学历员工比 15 年前更多——尽管这类人群的总量已有所减少。
Employers have also trimmed jobs in graduate-friendly industries. Across the EU the number of 15-to-24-year-olds employed in finance and insurance fell by 16 % 16 % 16%16 \% from 2009 to 2024. America has only slightly more jobs in “legal services” than in 2006. Until recently, the obvious path for a British student hoping to make money was a graduate scheme at a bank. Since 2016, however, the number of twentysomethings in law and finance has fallen by 10%. By the third season of “Industry”, a television drama about graduates at a London bank, a big chunk of the original cast has been pushed out (or has died).
雇主们也在缩减面向毕业生的行业岗位。2009 至 2024 年间,欧盟金融与保险业雇佣的 15-24 岁青年人数下降了 16 % 16 % 16%16 \% 。美国"法律服务"行业的就业岗位仅比 2006 年略多。直到不久前,英国学生赚钱的捷径仍是参加银行管培生计划。但自 2016 年以来,法律和金融行业的 20 多岁从业者已减少 10%。在描写伦敦银行毕业生生存图景的电视剧《投行风云》第三季中,大部分初代角色已被淘汰出局(或死亡)。
It is tempting to blame AI for these waning opportunities. The tech looks capable of automating entry-level “knowledge” work, such as filing or paralegal tasks. Yet the trends described in this piece started before ChatGPT. Lots of contingent factors are responsible. Many industries that traditionally employed graduates have had a tough time of late. Years of subdued activity in mergers and acquisitions have trimmed demand for lawyers. Investment banks are less go-getting than before the global financial crisis of 2007-09.
人们很容易将机会萎缩归咎于人工智能。这项技术似乎能自动化处理文件归档、律师助理等初级"知识型"工作。但本文所述趋势早在 ChatGPT 问世前就已显现。多重偶发因素共同导致了这一局面:传统上吸纳毕业生的多个行业近来处境艰难——并购活动多年低迷抑制了律师需求;投资银行的进取精神也远不及 2007-09 年全球金融危机之前。
So is college worth it? Americans seem to have decided not. From 2013 to 2022 the number of people enrolled in bachelor’s programmes fell by 5 % 5 % 5%5 \%, according to data from the OECD. Yet in most rich countries, where higher "education is cheaper because the state plays a larger role, youngsters are still funnelling into universities. Excluding America, enrolment across the OECD rose from 28 m to 31 m in the decade to 2022. In France the number of students went up by 36 % 36 % 36%36 \%; in Ireland by 45 % 45 % 45%45 \%. Governments are subsidising useless degrees, encouraging kids to waste time studying.
上大学值得吗?美国人似乎已给出否定答案。经合组织数据显示,2013 至 2022 年间美国本科项目入学人数下降 5 % 5 % 5%5 \% 。但在多数高教费用更低(因政府承担更多)的富裕国家,年轻人仍源源不断涌入大学。若排除美国,经合组织国家 2022 年前十年间入学人数从 2800 万增至 3100 万。法国学生数量增长 36 % 36 % 36%36 \% ,爱尔兰增长 45 % 45 % 45%45 \% 。政府正在补贴无实用价值的学位,变相鼓励年轻人浪费时间攻读。
Students also may not be picking the right subjects. Outside America, the share in arts, humanities and social sciences mostly grows. So, inexplicably, does enrolment in journalism courses. If these trends reveal young people’s ideas about the future of work, they truly are in trouble.
学生们的专业选择也未必明智。除美国外,艺术、人文社科类专业占比普遍增长。新闻学课程报名人数竟也莫名增加。若这些趋势折射出年轻人对职业前景的认知,那他们确实面临困境。

Al benchmarking  AI 基准测试

5、How to find the smartest AI
5、如何找到最聪明的人工智能

THE DIZZYING array of letters splattered across the page of one of Jonathan Roberts’s visual-reasoning questions resembles a word search assembled by a sadist. Test-takers aren’t merely tasked with finding the hidden words in the image, but with spotting a question written in the shape of a star and then answering that in turn (see below).
乔纳森·罗伯茨视觉推理题页面上令人眼花缭乱的字母阵列,宛如施虐者设计的单词寻宝游戏。应试者不仅需要在图像中找出隐藏单词,还要发现以星形图案呈现的问题并作答(见下图)。

AllQ  AllQ 测试平台

Sample problem from the ZeroBench test
ZeroBench 测试样题

Answer the question written in the shape of a star among the mess of letters
请从混乱字母中找出星形图案内的问题并作答

Source:ZeroBench See economist.com/aipuzzle for answer
来源:ZeroBench 答案请见 economist.com/aipuzzle

图:最新 A I A I AIA I 测试题
The intention of Mr Roberts's anthology of a hundred questions is not to help people pass the time on the train.Instead,it is to provide cutting-edge artificial-intelligence(AI)models like o3-pro,June's top-tier release from OpenAI,with a test worthy of their skills.
罗伯茨先生编纂的百题集初衷并非供人们在火车上消磨时间,而是为前沿人工智能(AI)模型——如 OpenAI 六月发布的顶级产品 o3-pro——提供与其能力相匹配的测试。
There is no shortage of tests for AI models.Some seek to measure general
针对 AI 模型的测试并不匮乏。有些旨在衡量通用

knowledge, others are subject-specific. There are those that aim to assess everything from puzzle-solving and creativity to conversational ability. But not all of these so-called benchmarking tests do what they claim to. Many were hurriedly assembled, with flaws and omissions; were too easy to cheat on, having filtered into the training data of AI models; or were just too easy for today’s “frontier” systems.
知识储备,另一些则聚焦专业领域。既有评估解谜能力与创造力的测试,也有衡量对话技巧的评估体系。但并非所有号称基准测试的项目都名副其实:许多测试仓促拼凑,存在缺陷与疏漏;或因渗入 AI 模型的训练数据而极易作弊;抑或对当今"前沿"系统而言过于简单。
ZeroBench, the challenge launched by Mr Roberts and his colleagues at the University of Cambridge, is one prominent alternative. It is targeted at large multimodal models-AI systems that can take images as well as text as input-and aims to present a test that is easy(ish) for the typical person and impossible for state-of-the-art models. For now, no large language model (LLM) can score a single point. Should some upstart one day do better, it would be quite an achievement.
剑桥大学罗伯茨教授及其同事发起的 ZeroBench 挑战赛是一个重要替代方案。该测试专门针对大型多模态模型(能同时处理图像和文本输入的人工智能系统),旨在设计出对普通人相对简单、但对最先进模型却无法通过的考题。目前没有任何大型语言模型(LLM)能在此测试中获得哪怕 1 分。若有新秀模型某天能突破这一纪录,将堪称重大突破。
ZeroBench isn’t alone. EnigmaEval is a collection of more than a thousand multimodal puzzles assembled by Scale AI, an AI data startup. Unlike ZeroBench, EnigmaEval doesn’t try to be easy for anyone. The puzzles, curated from a variety of pre-existing online quizzing resources, start at the difficulty of a fiendish cryptic crossword and get harder from there. When advanced AI systems are pitted against the hardest of these problems, their median score is zero. A frontier model from Anthropic, an AI lab, is the only model to have got a single one of these questions right.
ZeroBench 并非孤例。人工智能数据初创公司 Scale AI 构建的 EnigmaEval 题库包含上千道多模态谜题。与 ZeroBench 不同,EnigmaEval 并不追求题目简单化——这些从现有在线智力题库中精选的谜题,起始难度就堪比晦涩的密码填字游戏,之后更是层层加码。当先进 AI 系统面对其中最难题时,其中位数得分为零。人工智能实验室 Anthropic 开发的尖端模型,是目前唯一答对过其中一道题的模型。
Other question sets attempt to track more specific abilities. METR, an AI-safety group, for instance, tracks the length of time it would take people to perform individual tasks that AI models are now capable of (Anthropic is the first to break the hour mark). Another benchmark, the brashly named “Humanity’s Last Exam”, tests knowledge, rather than intelligence, with questions from the front line of human knowledge garnered from nearly a thousand academic experts.
其他测试集试图追踪更具体的能力。例如,人工智能安全组织 METR 通过统计人类完成 AI 模型现有单项任务所需时长来评估模型能力(Anthropic 首次突破一小时大关)。另一个名为"人类终极考试"的基准测试则侧重知识而非智力,其试题来自近千名学术专家提供的人类知识前沿内容。
One of the reasons for the glut of new tests is a desire to avoid the mistakes of the past. Older benchmarks abound with sloppy phrasings, bad markschemes
新测试大量涌现的原因之一是为了避免重蹈覆辙。旧版基准测试普遍存在表述含糊、评分标准混乱

or unfair questions. ImageNet, an early image-recognition data set, is an infamous example: a model that describes a photograph of a mirror in which fruit is reflected is penalised for saying the picture is of a mirror, but rewarded for identifying a banana.
或题目设计不公等问题。早期图像识别数据集 ImageNet 就是臭名昭著的案例:若模型将反射着水果的镜子照片描述为"镜子"会被扣分,但若识别出"香蕉"却能得分。
It is impossible to ask models to solve corrected versions of these tests without compromising researchers’ ability to compare them with models that took the flawed versions. Newer tests-produced in an era when AI research is flush with resources-can be laboriously vetted to spot such errors ahead of production.
若要求模型解答修正版试题,研究者将无法将其与完成缺陷版测试的模型进行对比。在 AI 研究资源充沛的时代,新型测试可在发布前经过严格审查来规避此类错误。
The second reason for the rush to build new tests is that models have learned the old ones. It has proved hard to keep any common benchmark out of the training data used by labs to train their models, resulting in systems that perform better on the exams than they do in normal tasks.
急于构建新测试的第二个原因是,现有模型已掌握旧测试内容。事实证明,实验室很难确保模型训练数据完全避开常见基准测试,这导致 AI 系统在考试中的表现优于日常任务。
The third, and most pressing, issue motivating the creation of new tests is saturation-AI models coming close to getting full marks. On a selection of 500 high-school maths problems, for example, o3-pro is likely to get a near-perfect score. But as o1-mini, released nine months earlier, scored 98.9 % 98.9 % 98.9%98.9 \%, the results do not offer observers a real sense of progress in the field."
推动新测试研发的第三个(也是最紧迫的)问题是性能饱和——AI 模型正接近满分。例如在 500 道高中数学题测试中,o3-pro 可能获得接近满分的成绩。但由于九个月前发布的 o1-mini 已取得 98.9 % 98.9 % 98.9%98.9 \% 分,这些结果无法让观察者真正感知该领域的进步。
This is where ZeroBench and its peers come in. Each tries to measure a particular way Al capabilities are approaching-or exceeding-those of humans. Humanity’s Last Exam, for instance, sought to devise intimidating general-knowledge questions (its name derives from its status as the most fiendish such test it is possible to set), asking for anything from the number of tendons supported by a particular hummingbird bone to a translation of a stretch of Palmyrene script found on a Roman tombstone. In a future where many AI models can score full marks on such a test, benchmark-setters may have to move away from knowledge-based questions entirely.
这正是 ZeroBench 及其同类基准测试的用武之地。它们各自试图衡量人工智能在特定能力上接近或超越人类的程度。例如,《人类终极测试》旨在设计令人望而生畏的通识考题(其名称源于该测试被设定为可能设计出的最刁钻考核),题目从某种蜂鸟骨骼支撑的肌腱数量,到罗马墓碑上发现的帕尔米拉文字片段翻译,无所不包。当未来许多 AI 模型都能在此类测试中获得满分时,基准制定者或许需要彻底摒弃基于知识的考题。
But even evaluations which are supposed to stand the test of time get toppled overnight. ARC-AGI, a non-verbal reasoning quiz, was introduced in 2024 with
但即便是那些本应经得起时间考验的评估体系,也可能在一夜之间被推翻。2024 年推出的非语言推理测试 ARC-AGI,

the intention of being hard for Al models. Within six months, OpenAl announced a model, o3, capable of scoring 91.5%.
让 AI 模型难以应对的初衷。六个月内,OpenAI 就发布了能获得 91.5%分数的 o3 模型。
For some AI developers, existing benchmarks miss the point. OpenAl’s boss Sam Altman hinted at the difficulties of quantifying the unquantifiable when the firm released its GPT-4.5 in February. The system “won’t crush benchmarks”, he tweeted. Instead, he added, before publishing a short story the model had written, “There’s a magic to it I haven’t felt before.”
对一些 AI 开发者而言,现有的基准测试未能触及核心。当 OpenAI 在 2 月发布 GPT-4.5 时,其老板山姆·奥特曼暗示了量化不可量化之物的困难。他在推特上表示该系统"不会碾压基准测试",随后在发布模型创作的短篇小说前补充道:"它有种我从未感受过的魔力。"
Some are trying to quantify that magic. Chatbot Arena, for example, allows users to have blind chats with pairs of LLMS before being asked to pick which is “better”-however they define the term. Models that win the most matchups float to the top of the leaderboard. This less rigid approach appears to capture some of that ineffable “magic” that other ranking systems cannot. They too, however, can be gamed, with more ingratiating models scoring higher with seducible human users."
有人正试图量化这种魔力。例如 Chatbot Arena 让用户与成对的 LLMs 进行盲聊后,选择哪一方"更优秀"——无论用户如何定义这个标准。胜率最高的模型会登上排行榜顶端。这种非刚性方法似乎捕捉到了其他排名系统无法触及的玄妙"魔力"。但这类系统同样存在操纵空间,更会讨好的模型往往能从易受影响的用户那里获得更高评分。
Others, borrowing an argument familiar to anyone with school-age children, question what any test can reveal about an AI model beyond how good it is at passing that test. Simon Willison, an independent AI researcher in California, encourages users to keep track of the queries that existing AI systems fail to fulfil before posing them to their successors. That way users can select models that do well at the tasks that matter to them, rather than high-scoring systems ill-suited to their needs.
另一些人借用了一个家有学龄儿童的人都熟悉的论点,质疑除了展示 AI 模型在特定测试中的表现外,这些测试还能揭示什么。加州独立 AI 研究员西蒙·威尔逊建议用户先记录现有 AI 系统无法满足的查询需求,再将这些问题抛给新一代模型。这样用户就能筛选出在真正重要任务上表现优异的模型,而非选择那些虽得高分却不符实际需求的系统。
All this assumes that AI models are giving the tests facing them their best shot. Sandbagging, in which models deliberately fail tests in order to hide their true capabilities (in order to, for example, prevent themselves from being deleted), has been observed in a growing number of models. In a report published in May from researchers at MATS, an Al-safety group, top LLMs were able to identify when they were being tested almost as well as the researchers themselves. This too complicates the quest for reliable benchmarks.
以上讨论都建立在 AI 模型会全力应对测试的前提下。但越来越多模型表现出"压分"行为——故意在测试中表现不佳以隐藏真实能力(例如为防止自身被淘汰)。AI 安全组织 MATS 五月发布的研究报告显示,顶尖 LLMs 识别测试场景的能力几乎与研究人员相当。这种现象也让建立可靠基准的追求变得更加复杂。
That being said, the value to AI companies of simple leaderboards which their
话虽如此,对 AI 公司而言,那些能让

products can top means the race to build better benchmarks will continue. ARC-AGI 2 was released in March, and still eludes today’s top systems. But, aware of how quickly that might change, work on ARC-AGI 3 has already begun.
产品能够登顶意味着构建更优基准的竞赛将持续进行。ARC-AGI 2 于三月发布,至今仍令顶尖系统望尘莫及。但意识到情况可能瞬息万变,ARC-AGI 3 的研发工作已然启动。

Research rankings  研究机构排名

6. Are China's universities really the best in the world? Nature's prestigious index says yes
6. 中国大学真是全球最佳吗?《自然》权威指数给出肯定答案

A DECADE AGO Nature, a scientific publisher, began tallying the contributions made by researchers at different institutions to papers published across a set of 145 respected journals. When the first such Nature Index was published in 2016, the Chinese Academy of Science (CAS) ranked first, but American and European institutions dominated the top ten. Harvard placed second, with Stanford and MIT fifth and sixth; the French National Centre for Scientific Research (CNRS) and the German Max Planck Society were third and fourth; Oxford and Cambridge took ninth and tenth (seventh and eighth place went, respectively, to the Helmholtz Association of German Research Centres and the University of Tokyo).
十年前,科学出版商《自然》开始统计不同机构研究人员在 145 种权威期刊上发表论文的贡献度。2016 年首期自然指数发布时,中国科学院(CAS)虽位列第一,但前十名仍由欧美机构主导。哈佛大学位居第二,斯坦福与麻省理工分列第五、六位;法国国家科研中心(CNRS)和德国马克斯·普朗克学会占据第三、四席;牛津与剑桥分列第九、十名(第七、八名则分别归属德国亥姆霍兹联合会和东京大学)。
Gradually, however, the table has turned. In 2020 Tsinghua University, in Beijing, entered the top ten. By 2022 Oxford and Cambridge were out, replaced by two Chinese rivals. Come 2024 only three Western institutions remained in the top ten: Harvard, CNRS and the Max Planck Society. This year, Harvard ranks second and Max Planck ninth. Eight of the top ten are Chinese.
然而,形势已逐渐逆转。2020 年,北京的清华大学跻身前十。到 2022 年,牛津和剑桥被挤出榜单,取而代之的是两所中国高校。2024 年,西方机构仅剩哈佛大学、法国国家科研中心和马克斯·普朗克学会三家留在前十。今年哈佛位列第二,马克斯·普朗克排名第九,而前十名中中国机构已占据八席。
The shift reflects a real and rapid improvement in China’s research capabilities. Over the past decade the country has increased its spending on research and development by roughly 9 % 9 % 9%9 \% annually in real terms. In 2023, adjusting for purchasing power, China outspent both America and the European Union on combined government and higher-education R&D. The country has also drawn
这一转变反映出中国研究能力的真实快速提升。过去十年间,中国的研发支出实际年增长率约为 9 % 9 % 9%9 \% 。2023 年按购买力平价计算,中国在政府和高等教育研发总支出上已超过美国和欧盟。中国还吸引了

back many Chinese researchers who were once based abroad, a cohort known as haigui (sea turtles), a homophone for “returning from across the sea”.
许多曾旅居海外的中国研究人员纷纷回国,这个群体被称为"海归",与"从海外归来"谐音。
All this has paid off. The country now publishes more high-impact papers (those in the most-highly cited 1%) than either America or Europe. In fields like chemistry, engineering and materials science the country is now considered a world leader. China also produces a huge volume of high-quality computer-science research. Zhejiang University, fourth in the 2025 index, was the alma mater of Liang Wenfeng, the founder of DeepSeek, China’s cutting-edge artificial-intelligence (AI) company.
这一切努力终见成效。如今中国发表的高影响力论文(引用率前 1%的顶尖论文)数量已超越美国和欧洲。在化学、工程学和材料科学等领域,中国已被视为全球领导者。中国还产出了大量高质量的计算机科学研究成果。在 2025 年指数中排名第四的浙江大学,正是中国前沿人工智能企业深度求索(DeepSeek)创始人梁文锋的母校。
Yet the way the rankings are created plays to China’s strengths. The journals included in the index are chosen to be representative of top-tier research across the natural sciences, with the composition regularly tweaked to reflect the state of the field. A growing number of publications in chemistry and physical-science journals has led to their share increasing to just over half those used in the 2025 index. Papers from health and biological-science journals, however, which remain an area of Western dominance, account for only 20% of the index.
不过排名机制的设定恰好放大了中国的优势。该指数遴选期刊时旨在反映自然科学领域的顶尖研究水平,并会定期调整期刊构成以体现学科发展态势。随着化学和物理科学类期刊发文量持续增长,其在 2025 年指数中的占比已提升至半数以上。而西方仍占据优势的健康与生物科学类期刊论文,在指数中仅占 20%的权重。
China’s research centres also tumble down the table when the studies under consideration are limited to those published in Nature and Science, the two journals widely regarded as the most prestigious. CAS is the only institution in that country near the top of that leaderboard, placing fourth.
当研究范围仅限于《自然》和《科学》这两本公认最具声望的期刊时,中国研究中心的排名也出现下滑。中国科学院是该国唯一跻身该榜单前列的机构,位列第四。
Observers should treat these rankings with caution. Although the Nature Index is a useful measure of an institution or country’s scientific might, its "assessments are inevitably incomplete. Plenty of valuable research is published in lower-tier journals, and world-changing innovation will not always come from high-scoring institutions. That being said, Zhejiang, Peking and Tsinghua universities have earned their place with CAS among the world’s best.
观察人士应谨慎看待这些排名。尽管自然指数是衡量机构或国家科研实力的有用指标,但其"评估必然存在局限性"。许多有价值的研究发表在较低级别的期刊上,改变世界的创新并不总是来自高分机构。尽管如此,浙江大学、北京大学和清华大学与中国科学院共同跻身世界顶尖机构之列实至名归。