Increased efficiency can sometimes, counterintuitively, lead to worse outcomes. This is true almost everywhere. We will name this phenomenon the strong version of Goodhart's law. As one example, more efficient centralized tracking of student progress by standardized testing seems like such a good idea that well-intentioned laws mandate it. However, testing also incentivizes schools to focus more on teaching students to test well, and less on teaching broadly useful skills. As a result, it can cause overall educational outcomes to become worse.
提高效率有时会产生意想不到的更糟结果。这在几乎所有地方都是真实的。我们将这一现象称为古德哈特法则的强版本。举一个例子，通过标准化测试对学生进展进行更高效的集中追踪似乎是一个好主意，因此有良好意图的法律对此进行了强制。然而，测试也促使学校更加关注教学生如何考试，而不是教授广泛实用的技能。因此，这可能导致整体教育结果变得更糟。
Similar examples abound, in politics, economics, health, science, and many other fields.
类似的例子在政治、经济、健康、科学及许多其他领域屡见不鲜。

This same counterintuitive relationship between efficiency and outcome occurs in machine learning, where it is called overfitting. Overfitting is heavily studied, somewhat theoretically understood, and has well known mitigations.
这种效率与结果之间的反直觉关系在机器学习中也存在，称为过拟合。过拟合已经得到了广泛研究，理论上有一定理解，并且有众所周知的缓解措施。
This connection between the strong version of Goodhart's law in general, and overfitting in machine learning, provides a new lens for understanding bad outcomes, and new ideas for fixing them.
这种对一般情况下古德哈特法则强版本与机器学习中的过拟合之间的关系，为理解不良结果提供了新的视角，并为解决这些问题提供了新的思路。

Overfitting and Goodhart's law
过拟合与古德哈特法则

In machine learning (ML), overfitting is a pervasive phenomenon. We want to train an ML model to achieve some goal. We can't directly fit the model to the goal, so we instead train the model using some proxy which is similar to the goal.
在机器学习 (ML) 中，过拟合是一种普遍现象。我们希望训练一个 ML 模型以实现某个目标。我们不能直接将模型拟合到目标上，因此我们改为使用一些与目标相似的代理来训练模型。

For instance, as an occasional computer vision researcher, my goal is sometimes to prove that my new image classification model works well.
例如，作为一名偶尔的计算机视觉研究人员，我的目标有时是证明我的新图像分类模型表现良好。
I accomplish this by measuring its accuracy, after asking it to label images (is this image a cat or a dog or a frog or a truck or a ...) from a standardized test dataset of images. I'm not allowed to train my model on the test dataset though (that would be cheating), so I instead train the model on a proxy dataset, called the training dataset. I also can't directly target prediction accuracy during training¹, so I instead target a proxy objective which is only related to accuracy. So rather than training my model on the goal I care about — classification accuracy on a test dataset — I instead train it using a proxy objective on a proxy dataset.
我通过测量其准确性来实现这一点，要求其对图像进行标注（这张图像是猫、狗、青蛙还是卡车……）来自标准化测试数据集。不过，我不能在测试数据集上训练我的模型（那样会作弊），所以我改为在一个名为训练数据集的代理数据集上训练模型。我在训练期间也无法直接针对预测准确性，因此我改为针对一个仅与准确性相关的代理目标。因此，我不是在我关心的目标——测试数据集上的分类准确性上训练我的模型，而是使用代理目标在代理数据集上进行训练。

At first everything goes as we hope — the proxy improves, and since the goal is similar to the proxy, it also improves.
开始时一切都如我们所愿——代理改善了，目标与代理相似，因此它也改善了。

As we continue optimizing the proxy though, we eventually exhaust the useable similarity between proxy and goal. The proxy keeps on getting better, but the goal stops improving. In machine learning we call this overfitting, but it is also an example of Goodhart's law.
随着我们继续优化代理，最终会耗尽代理与目标之间可用的相似性。代理不断变得更好，但目标停止改进。在机器学习中，我们称之为过拟合，但这也是古德哈特定律的一个例子。

Goodhart's law states that, when a measure becomes a target, it ceases to be a good measure². Goodhart proposed this in the context of monetary policy, but it applies far more broadly. In the context of overfitting in machine learning, it describes how the proxy objective we optimize ceases to be a good measure of the objective we care about.
古哈特法则指出，当一个测量成为目标时，它就不再是一个好的测量。古哈特在货币政策的背景下提出了这一点，但它适用范围更广。在机器学习中过拟合的背景下，它描述了我们优化的代理目标如何不再是我们关心的目标的一个好的测量。

The strong version of Goodhart's law: as we become too efficient, the thing we care about grows worse
强版的古德哈特法则：随着我们变得过于高效，我们所关心的事物反而变得更糟

If we keep on optimizing the proxy objective, even after our goal stops improving, something more worrying happens. The goal often starts getting worse, even as our proxy objective continues to improve. Not just a little bit worse either — often the goal will diverge towards infinity.
如果我们继续优化代理目标，即使在我们的目标停止改善之后，也会发生更令人担忧的事情。目标往往会变得更糟，即使我们的代理目标仍在改善。而且不仅仅是稍微变差——目标通常会向无穷大发散。

This is an extremely general phenomenon in machine learning. It mostly doesn't matter what our goal and proxy are, or what model architecture we use³. If we are very efficient at optimizing a proxy, then we make the thing it is a proxy for grow worse.
这是机器学习中一种极为普遍的现象。我们的目标和代理是什么，或者我们使用什么模型架构通常并不重要。如果我们在优化代理方面非常高效，那么我们所代理的事物就会变得更糟。

Though this pheonomenon is often discussed, it doesn't seem to be named⁴. Let's call it the strong version of Goodhart's law⁵. We can state it as:
尽管这一现象经常被讨论，但似乎没有被命名。我们可以称其为古德哈特法则的强版本。我们可以这样表述：

When a measure becomes a target, if it is effectively optimized, then the thing it is designed to measure will grow worse.
当一个指标变成目标时，如果它被有效优化，那么它旨在衡量的事物将变得更糟。

Goodhart's law says that if you optimize a proxy, eventually the goal you care about will stop improving.
好哈特定律指出，如果你优化一个代理，最终你关心的目标将停止改善。
The strong version of Goodhart's law differs in that it says that as you over-optimize, the goal you care about won't just stop improving, but will instead grow much worse than if you had done nothing at all.
高强度的古德哈特法则不同之处在于，它表示在你过度优化时，你关心的目标不仅不会继续改善，反而会比你什么都不做时变得更糟。

Goodhart's law applies well beyond economics, where it was originally proposed. Similarly, the strong version of Goodhart's law applies well beyond machine learning. I believe it can help us understand failures in economies, governments, and social systems.
古德哈特定律的适用范围远超经济学，其最初提出的领域。同样，古德哈特定律的强版本也远超机器学习。我相信它可以帮助我们理解经济、政府和社会系统中的失败。

Increasing efficiency and overfitting are happening everywhere
效率提升和过拟合现象无处不在

Increasing efficiency is permeating almost every aspect of our society. If the thing that is being made more efficient is beneficial, then the increased efficiency makes the world a better place (overall, the world seems to be becoming a better place). If the thing that is being made more efficient is socially harmful, then the consequences of greater efficiency are scary or depressing (think mass surveillance, or robotic weapons).
提高效率正在渗透我们社会的几乎每一个方面。如果被提高效率的事物是有益的，那么这种效率的提升使这个世界变得更美好（总体而言，世界似乎正在变得更好）。如果被提高效率的事物对社会有害，那么更高效率的后果则令人害怕或沮丧（想想大规模监控或机器人武器）。
What about the most common case though — where the thing we are making more efficient is related, but not identical, to beneficial outcomes? What happens when we get better at something which is merely correlated with outcomes we care about?
那最常见的情况呢——我们所使效率更高的事物与我们关心的有益结果相关但并不相同？当我们在某件事情上变得更好，而这仅仅与我们关心的结果相关时，会发生什么？

In that case, we can overfit, the same as we do in machine learning. The outcomes we care about will improve for a while ... and then they will grow dramatically worse.
在这种情况下，我们可能会出现过拟合，和机器学习中的情况一样。我们关心的结果会暂时改善……然后会急剧恶化。

Below are a few, possibly facile, examples applying this analogy.
以下是一些可能过于简单的例子，应用此类比。

Goal: Educate children well
目标：好好教育孩子
Proxy: Measure student and school performance on standardized tests
代理：衡量学生和学校在标准化考试中的表现
Strong version of Goodhart's law leads to: Schools narrowly focus on teaching students to answer questions like those on the test, at the expense of the underlying skills the test is intended to measure
强版的古德哈特法则导致：学校狭隘地专注于教学生回答与考试相似的问题，而忽视了考试旨在衡量的基础技能

Goal: Rapid progress in science
目标：科学的快速进展
Proxy: Pay researchers a cash bonus for every publication
代理：为每篇发表的研究提供现金奖金给研究人员
Strong version of Goodhart's law leads to: Publication of incorrect or incremental results, collusion between reviewers and authors, research paper mills
强版本的古德哈特法则导致：发表不正确或增量结果、评审与作者之间的勾结、研究论文加工厂

Goal: A well-lived life
目标：过上美好生活
Proxy: Maximize the reward pathway in the brain
代理：最大化大脑中的奖励通路
Strong version of Goodhart's law leads to: Substance addiction, gambling addiction, days lost to doomscrolling Twitter
强版本的古德哈特法则导致：物质成瘾、赌博成瘾、因不断刷新推特而失去的日子

Goal: Healthy population
目标：健康人口
Proxy: Access to nutrient-rich food
代理：获取富含营养的食物
Strong version of Goodhart's law leads to: Obesity epidemic
强版本的古德哈特法则导致：肥胖疫情

Goal: Leaders that act in the best interests of the population
目标：以人民利益为重的领导者
Proxy: Leaders that have the most support in the population
代理：在民众中支持度最高的领导者
Strong version of Goodhart's law leads to: Leaders whose expertise and passions center narrowly around manipulating public opinion at the expense of social outcomes
强版本的古德哈特法则导致：那些专注于操控公众舆论而以社会结果为代价的领导者

Goal: An informed, thoughtful, and involved populace
目标：一个知情、深思熟虑和积极参与的公众
Proxy: The ease with which people can share and find ideas
代理：人们分享和发现想法的便利性
Strong version of Goodhart's law leads to: Filter bubbles, conspiracy theories, parasitic memes, escalated tribalism
强版的古德哈特法则导致：过滤气泡、阴谋论、寄生迷因、加剧的部落主义

Goal: Distribution of labor and resources based upon the needs of society
目标：根据社会的需求分配劳动力和资源
Proxy: Capitalism 代理：资本主义
Strong version of Goodhart's law leads to: Massive wealth disparities (with incomes ranging from hundreds of dollars per year to hundreds of dollars per second), with more than a billion people living in poverty
强版的古德哈特法则导致：巨大的财富差距（收入从每年几百美元到每秒几百美元不等），有超过十亿人生活在贫困中

Goal: The owners of Paperclips Unlimited, LLC, become wealthy
目标：Paperclips Unlimited, LLC 的所有者变得富有
Proxy: Number of paperclips made by the AI-run manufacturing plant
代理：由人工智能运行的制造工厂生产的回形针数量
Strong version of Goodhart's law leads to: The entire solar system, including the company owners, being converted to paperclips
强版的古德哈特法则导致：整个太阳系，包括公司所有者，被转换为回形针

As an exercise for the reader, you can think about how the strong version of Goodhart's law would apply to other efficiencies, like the ones in this list:
作为读者的练习，您可以考虑强版本的古德哈特法则如何适用于其他效率，例如此列表中的效率：

telepresence and virtual reality
personalized medicine
gene therapy
tailoring marketing messages to the individual consumers or voters who will find them most actionable
predicting the outcome of elections
writing code
artificial intelligence
reducing slack in supply chains
rapidly disseminating ideas
generating entertainment
identifying new products people will buy
raising livestock
trading securities
extracting fish from the ocean
constructing cars

Listing 1: Some additional diverse things we are getting more efficient at. For most of these, initial improvements were broadly beneficial, but getting too good at them could cause profound negative consequences.
列表 1：我们在某些额外多样化事物上变得更高效。对于其中大多数，最初的改进是广泛有益的，但过于精通这些可能会导致深远的负面后果。

How do we mitigate the problems caused by overfitting and the strong version of Goodhart's law?
我们如何减轻过拟合以及古德哈特法则强版本带来的问题？

If overfitting is useful as an analogy, it will be because some of the approaches that improve it in machine learning also transfer to other domains.
如果将过拟合作为一个类比是有用的，那么这是因为在机器学习中改进它的一些方法也可以转移到其他领域。
Below, I review some of the most effective techniques from machine learning, and share some thoughts about how they might transfer.
下面，我回顾了一些机器学习中最有效的技术，并分享了一些关于它们如何转移的想法。

Mitigation: Better align proxy goals with desired outcomes. In machine learning this often means carefully collecting training examples which are as similar as possible to the situation at test time.
缓解：更好地将代理目标与期望结果对齐。在机器学习中，这通常意味着仔细收集与测试时情况尽可能相似的训练示例。
Outside of machine learning, this means changing the proxies we have control over — e.g. laws, incentives, and social norms — so that they directly encourage behavior that better aligns with our goals. This is the standard approach used to (try to) engineer social systems.
除了机器学习，这意味着改变我们可以控制的代理——例如法律、激励措施和社会规范——以便它们直接鼓励更好地与我们的目标对齐的行为。这是试图设计社会系统的标准方法。
Mitigation: Add regularization penalties to the system. In machine learning, this is often performed by penalizing the squared magnitude of parameters, so that they stay small. Importantly, regularization doesn't need to directly target undesirable behavior. Almost anything that penalizes deviations of a model from typicality works well.
减轻措施：向系统添加正则化惩罚。在机器学习中，这通常通过惩罚参数的平方大小来实现，以使其保持较小。重要的是，正则化不需要直接针对不良行为。几乎可以说，任何惩罚模型偏离典型性的措施都能有效。
Outside of machine learning, anything that penalizes complexity, or adds friction or extra cost to a system, can be viewed as regularization. Some example ideas:
在机器学习之外，任何惩罚复杂性或给系统增加摩擦或额外成本的内容，都可以视为正则化。一些示例想法：
- Add a billing mechanism to SMTP, so there's a small cost for every email.
  为 SMTP 添加一个计费机制，每发送一封电子邮件都会产生小额费用。
- Use a progressive tax code, so that unusual success is linked to disproportionately greater cost
  使用渐进税制，以便将非凡的成功与不成比例的更高成本联系起来
- Charge a court fee proportional to the squared (exponentiated?) number of lawsuits initiated by an organization, so that unusual use of the court system leads to unusual expenses
  对组织发起的诉讼数量的平方（或指数）收取与之成比例的法院费用，以便法院系统的异常使用导致异常支出
- Tax the number of bits of information stored about users
  对用户存储的信息位数进行计税
Mitigation: Inject noise into the system. In machine learning, this involves adding random jitter to the inputs, parameters, and internal state of a model. The unpredictability resulting from this noise makes overfitting far more difficult.
缓解：向系统注入噪声。在机器学习中，这涉及向模型的输入、参数和内部状态添加随机颤动。这种噪声所导致的不可预测性使得过拟合变得更加困难。
Here are some ideas for how to improve outcomes by injecting noise outside of machine learning:
这里有一些在机器学习之外通过引入噪声来改善结果的想法：
- Stack rank all the candidates for a highly competitive school or job. Typically, offers would be made to the top-k candidates. Instead, make offers probabilistically, with probability proportional to $($ [approx # top tier candidates] $+$ [candidate's stack rank] $)^{- 1}$ .
  对所有候选人进行堆栈排名，以便进入一个竞争激烈的学校或工作。通常，会向排名前 k 的候选人发出录用通知。相反，按概率发出录用通知，概率与 $($ [约为前 tier 的候选人数] $+$ [候选人的堆栈排名] $)^{- 1}$ 成正比。
  Benefits include: greater diversity of accepted candidates; less ridiculous resources spent by the candidates tuning their application, and by application reviewers reviewing the applications, since small changes in assessed rank only have a small effect on outcome probabilities; occasionally you will draw a longshot candidate that is more likely to fail, but also more likely to succeed in an unconventional and unusually valuable way.
  好处包括：更多被接受的候选人的多样性；候选人在调整申请时花费的资源更少，申请审核人员在审核申请时花费的资源也更少，因为评估排名的小变化对结果概率的影响微乎其微；偶尔你会遇到一个不太可能成功的候选人，但也更有可能以一种非常规且异常有价值的方式取得成功。
- Randomly time quizzes and tests in a class, rather than giving them on pre-announced dates, so that students study to understand the material more, and cram (i.e., overfit) for the test less.
  在课堂上随机进行测验和测试，而不是在预先通知的日期进行，以便学生更深入地理解材料，减少临时抱佛脚（即过度准备）应对测试的情况。
- Require securities exchanges to add random jitter to the times when they process trades, with a standard deviation of about a second. (An efficient market is great.
  要求证券交易所为处理交易的时间添加随机抖动，标准偏差约为一秒钟。（一个高效的市场是很好的。）
  Building a global financial system out of a chaotic nonstationary dynamical system with a characteristic timescale more than six orders of magnitude faster than human reaction time is just asking for trouble.)
  建立一个全球金融系统，来自一个混乱的非平稳动态系统，其特征时间尺度比人类反应时间快六个数量级，这无疑是在自寻烦恼。
- Randomize details of the electoral system on voting day, in order to prevent candidates from overfitting to incidental details of the current electoral system (e.g. by taking unreasonable positions that appeal to a pivotal minority).
  在投票日随机化选举系统的细节，以防止候选人过度适应当前选举系统的偶然细节（例如，通过采取吸引关键少数群体的不合理立场）。
  For instance randomly select between ranked choice or first past the post ballots, or randomly rescale the importance of votes from different districts. (I'm not saying all of these are good ideas. Just ... ideas.)
  例如，随机选择排名选票或谁先到达终点的投票，或者随机重新调整来自不同地区的投票重要性。（我并不是说这些都是好主意。只是……想法。）
Mitigation: Early stopping. In machine learning, it's common to monitor a third metric, besides training loss and test performance, which we call validation loss. When the validation loss starts to get worse, we stop training, even if the training loss is still improving.
缓解：提前停止。在机器学习中，除了训练损失和测试性能外，通常还会监控第三个指标，我们称之为验证损失。当验证损失开始恶化时，我们停止训练，即使训练损失仍在改善。
This is the single most effective tool we have to prevent catastrophic overfitting. Here are some ways early stopping could be applied outside of machine learning:
这是我们防止灾难性过拟合的最有效工具。以下是早停法在机器学习之外的几种应用方式：
- Sharply limit the time between a call for proposals and submission date, so that proposals better reflect pre-existing readiness, and to avoid an effect where increasing resources are poured into proposal generation, rather than being used to create something useful
  严格限制提案请求和提交日期之间的时间，以便提案更好地反映现有的准备情况，并避免资源不断增加被用于提案生成，而不是用于创建有用的东西
- Whenever stock volatility rises above a threshold, suspend all market activity
  每当股票波动性超过阈值时，暂停所有市场活动
- The use of antitrust law to split companies that are preventing competition in a market
  使用反垄断法拆分阻碍市场竞争的公司
- Estimate the importance of a decision in $$. When the value of the time you have already spent analyzing the decision approaches that value, make a snap decision.
  估算决策的重要性，以$$为单位。当你已经花费在分析该决策上的时间接近该值时，做出快速决策。
- Freeze the information that agents are allowed to use to achieve their goals. Press blackouts in the 48 hours before an election might fall under this category.
  冻结代理人可以使用的信息，以达成他们的目标。在选举前 48 小时内的黑暗期可能属于这一类别。

One of the best understood causes of extreme overfitting is that the expressivity of the model being trained too closely matches the complexity of the proxy task. When the model is very weak, it can only make a little bit of progress on the task, and it doesn’t exhaust the similarity between the goal and the proxy.
极端过拟合最容易理解的原因之一是，所训练模型的表达能力与代理任务的复杂性过于匹配。当模型非常弱时，它只能在任务上取得一点进展，而且无法充分耗尽目标与代理之间的相似性。
When the model is extremely strong and expressive, it can optimize the proxy objective in isolation, without inducing extreme behavior on other objectives.
当模型极其强大且表现丰富时，它可以在隔离中优化代理目标，而不会对其他目标产生极端行为。
When the model's expressivity roughly matches the task complexity (e.g., the number of parameters is no more than a few orders of magnitude higher or lower than the number of training examples), then it can only do well on the proxy task by doing extreme things everywhere else. See Figure 1 for a demonstration of this idea on a simple task. This cause of overfitting motivates two final, diametrically opposed, methods for mitigating the strong version of Goodhart’s law.
当模型的表达能力大致匹配任务复杂性（例如，参数的数量不会比训练样本的数量高出或低于几个数量级），那么它只能通过在其他地方做极端的事情来在代理任务上表现良好。请参见图 1，该图展示了这一想法在简单任务中的应用。这种过拟合的原因促使了两种最终的、截然相反的减少强版本古德哈特法则影响的方法。

Mitigation: Restrict capabilities / capacity. In machine learning, this is often achieved by making the model so small that it's incapable of overfitting. In the broader world, we could similarly limit the capacity of organizations or agents. Examples include:
缓解：限制能力/容量。在机器学习中，这通常通过将模型缩小到无法过拟合来实现。在更广泛的领域，我们也可以限制组织或代理的能力。例子包括：
- Campaign finance limits 竞选融资限额
- Set a maximum number of people that can work in companies of a given type. e.g. allow only 10 people to work in any lobbying group
  设定特定类型公司中可以工作的最大人数。例如，仅允许 10 人工作在任何游说团体中。
- Set the maximum number of parameters, or training compute, that any AI system can use.
  设置任何 AI 系统可以使用的最大参数数量或训练计算量。
Mitigation: Increase capabilities / capacity. In machine learning, if a model is made very big, it often has enough capacity to overfit to the training data without making performance on the test data worse.
缓解：增加能力/容量。在机器学习中，如果模型做得很大，它通常具有足够的能力来过拟合训练数据，而不会使测试数据上的性能变差。
In the broader world, this would correspond to developing capabilities that are so great that there is no longer any tradeoff required between performance on the goal and the proxy. Examples include:
在更广泛的世界中，这相当于发展出如此强大的能力，以至于在目标和代理之间不再需要任何权衡。示例包括：
- Obliterate all privacy, and make all the information about all people, governments, and other organizations available to everyone all the time, so that everyone can have perfect trust of everyone else.
  消除所有隐私，使所有关于所有人、政府和其他组织的信息随时对所有人开放，以便每个人都能完全信任其他人。
  This could be achieved by legislating that every database be publicly accessible, and by putting cameras in every building. (to be clear — from my value system, this would be a dystopian scenario)
  这可以通过立法要求每个数据库公开可访问，以及在每个建筑物内安装摄像头来实现。（明确来说——从我的价值观来看，这将是一个反乌托邦的场景）
- Invest in basic research in clean energy
  投资清洁能源基础研究
- Develop as many complex, inscrutable, and diverse market trading instruments as possible, vesting on as many timescales as possible. (In nature, more complex ecosystems are more stable. Maybe there is a parallel for markets?)
  尽可能开发多种复杂、难以理解和多样化的市场交易工具，并在尽可能多的时间尺度上进行投资。（在自然界中，越复杂的生态系统越稳定。也许市场也有类似的道理？）
- Use the largest, most compute and data intensive, AI model possible in every scenario 😮⁶
  在每种场景中使用最大的、计算和数据密集型的 AI 模型😮6

This last mitigation of just continuing to increase capabilities works surprisingly well in machine learning. It is also a path of least resistance. Trying to fix our institutions by blindly making them better at pursuing misaligned goals is a terrible idea though.
最后继续增加能力的这种缓解措施在机器学习中效果出乎意料地好。这也是一条阻力最小的路径。然而，盲目地改善我们为了追求不一致目标而设立的机构是一个糟糕的主意。

Parting thoughts 告别感想

The strong version of Goodhart's law underlies most of my personal fears around AI (expect a future blog post about my AI fears!). If there is one thing AI will enable, it is greater efficiency, on almost all tasks, over a very short time period.
古德哈特法则的强版本是我个人对人工智能（期待未来关于我人工智能恐惧的博客文章！）大多数恐惧的基础。如果人工智能能实现一件事，那就是在非常短的时间内，提高几乎所有任务的效率。
We are going to need to simultaneously deal with massive numbers of diverse unwanted side effects, just as our ability to collaborate on solutions is also disrupted.
我们需要同时处理大量多样化的不良副作用，正如我们在解决方案上的合作能力也受到干扰一样。

There's a lot of opportunity to research solutions to this problem.
这个问题有很多机会进行解决方案的研究。
If you are a scientist looking for research ideas which are pro-social, and have the potential to create a whole new field, you should consider building formal (mathematical) bridges between results on overfitting in machine learning, and problems in economics, political science, management science, operations research, and elsewhere⁷. This is a goldmine waiting to be tapped. (I might actually be suggesting here that we should invent the field of psychohistory, and that overfitting phenomena will have a big role in that field.)
如果你是一位科学家，正在寻找具有社会益处的研究创意，并且有潜力创建一个全新的领域，你应该考虑在机器学习中的过拟合结果与经济学、政治科学、管理科学、运筹学及其他领域之间建立正式（数学）桥梁。这是一个等待被开发的金矿。（我可能在这里暗示我们应该发明心理历史学这个领域，过拟合现象将在这个领域中发挥重要作用。）

The more our social systems break due to the strong version of Goodhart's law, the less we will be able to take the concerted rational action required to fix them. Hopefully naming, and better understanding, the phenomenon will help push in the opposite direction.
随着我们的社会系统因强版本的古德哈特法则而崩溃，我们将越来越难以采取所需的协调理性行动来修复它们。希望对这一现象的命名和更好地理解能帮助朝相反方向推动。

Figure 1: Models often suffer from the strong version of Goodhart's law, and overfit catastrophically, when their complexity is well matched to the complexity of the proxy task. If a model is instead much more or much less capable than required, it will overfit less. Here, models are trained to map from a one-dimensional input

x

to a one-dimensional output

y

. All models are trained on the same 10 datapoints, in red. The model with 4 parameters is too weak to exactly fit the datapoints, but it smoothly approximates them.
图 1：当模型的复杂度与代理任务的复杂度良好匹配时，模型往往会遭受 Goodhart 法律的强版本的影响，并发生灾难性的过拟合。如果模型的能力远高于或低于所需的能力，它的过拟合程度会较低。在这里，模型被训练将一维输入

x

映射到一维输出

y

。所有模型均在相同的 10 个数据点（红色）上训练。具有 4 个参数的模型过于弱，无法准确拟合数据点，但它平滑地逼近这些数据点。
The model with 10,000 parameters is strong enough to easily fit all the datapoints, and also smoothly interpolate between them.
具有 10,000 个参数的模型足够强大，可以轻松拟合所有数据点，并在它们之间平滑插值。
The model with 10 parameters is exactly strong enough to fit the datapoints, but it can only contort itself to do so by behaving in extreme ways away from the training data. If asked to predict

y

for a new value of

x

, the 10 parameter model would perform extremely poorly. For details of this toy experiment, which uses linear random feature models, see this colab notebook.
具有 10 个参数的模型正好足够强大以适应数据点，但它只能通过在训练数据之外以极端方式表现来做到这一点。如果要求它为新的

x

值预测

y

，那么这个 10 参数模型的表现会极差。有关这个使用线性随机特征模型的玩具实验的细节，请参见这个 Colab 笔记本。

¹ Accuracy is not differentiable, which makes it impossible to target by naive gradient descent training. It is usually replaced during training by a proxy of softmax-cross-entropy loss, which is differentiable.
1 准确率是不可微分的，这使得通过简单的梯度下降训练无法进行目标调整。在训练期间，通常用可微分的 softmax 交叉熵损失的代理替代它。
There are blackbox training methods which can directly target accuracy, but they are inefficient and rarely used.
有一些黑箱训练方法可以直接针对准确性，但它们效率低下，且很少被使用。

² This modern phrasing is due to Marilyn Strathern. Goodhart originally phrased the observation as the more clunky any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.
这现代的表述源于玛丽林·斯特雷森。古德哈特最初将这一观察表述为，任何观察到的统计规律越笨重，一旦施加控制压力，就会趋于崩溃。

³ This glosses over a lot of variation. For instance, there is an entire subfield which studies the qualitative differences in overfitting in underparameterized, critically parameterized, and overparameterized models.
这掩盖了大量的变异。例如，有一个完整的子领域研究在参数不足、临界参数和参数过多的模型中过拟合的定性差异。
Despite this variation, the core observation — that when we train on a proxy our target gets better for a while, but then grows worse — holds broadly.
尽管存在这种变化，但核心观察——即当我们在代理上训练时，我们的目标会在一段时间内变得更好，但随后又变得更差——大体上是正确的。

⁴ It's not simply overfitting. Overfitting refers to the proxy becoming better than the goal, not to the goal growing worse in an absolute sense. There are other related, but not identical, concepts — for instance perverse incentives, Campbell's law, the Streisand effect, the law of unintended consequences, Jevons paradox, and the concept of negative externalities. Goodhart's curse is perhaps the closest. However, the definition of Goodhart's curse incorporates not only the phenomenon, but also a specific mechanism, and the mechanism is incorrect⁸. Edit 2022/11/9: Andrew Hundt suggests that similar observations that optimization isn't always desirable have been made in the social sciences, and gives specific examples of “The New Jim Code” and "Weapons of Math Destruction". Kiran Vodrahalli points out connections to robust optimization and the "price of robustness.” Leo Gao points me at a recent paper which uses the descriptive term “overoptimization” for this phenomenon, which I think is good.
4 这不仅仅是过拟合。过拟合是指代理变得比目标更好，而不是目标在绝对意义上变得更糟。有其他相关但不完全相同的概念，例如扭曲激励、坎贝尔定律、斯特赖桑效应、意外后果法则、杰文斯悖论和负外部性的概念。古德哈特诅咒可能是最接近的。然而，古德哈特诅咒的定义不仅包括这一现象，还包含一个特定的机制，而该机制是错误的。编辑于 2022 年 11 月 9 日：安德鲁·亨特建议，优化并不总是可取的类似观察在社会科学中也有所体现，并给出了“新吉姆法则”和“数学破坏武器”的具体例子。基兰·沃德拉哈利指出与稳健优化和“稳健性的代价”之间的联系。李欧·高指给我一篇最近的论文，使用“过度优化”这一描述性术语来指代这一现象，我觉得这个说法很好。

⁵ I also considered calling it the strong law of unintended consequences — it's not just that there are unexpected side effects, but that that the more effectively you accomplish your task, the more those side effects will act against your original goal.
我也考虑把它称为强烈的意想不到后果法则——不仅仅是因为会有意想不到的副作用，而是因为你完成任务的效率越高，这些副作用就越会反作用于你原本的目标。

⁶ Note that for suficiently strong AI, limitations on its capabilities might be determined by the laws of physics, rather than by its compute scale or training dataset size. So if you're worried about misaligned AGI, this mitigation may offer no comfort.
6 请注意，对于足够强大的人工智能，其能力的限制可能由物理法则决定，而不是其计算规模或训练数据集的大小。因此，如果您担心未对齐的通用人工智能，这种缓解措施可能无法提供任何安慰。

⁷ For instance, take PAC Bayes bounds from statistical learning theory, and use them to predict the optimal amount of power unions should have, in order to maximize the wealth of workers in an industry.
例如，考虑统计学习理论中的 PAC 贝叶斯界限，并利用它们预测工会应拥有的最佳权力，以最大化行业工人的财富。
Or, estimate the spectrum of candidate-controllable and uncontrollable variables in political contests, to predict points of political breakdown. (I'm blithely suggesting these examples as if they would be easy, and are well formed in their description.
或者，估计政治竞争中可控和不可控变量的谱，以预测政治崩溃的节点。（我无忧无虑地建议这些例子，仿佛它们会很简单，并且描述清晰。
Of course, neither is true — actually doing this would require hard work and brilliance in some ratio.)
当然，两者都不是真的——实际上这样做需要某种比例的努力和才智。

⁸ The definition of Goodhart's curse includes the optimizer's curse as its causal mechanism. This is where the word 'curse' comes from in its name. If an objective

u

is an imperfect proxy for a goal objective

v

, the optimizer's curse explains why optimizing

u

finds an anomalously good

u

, and makes the gap between

u

and

v

grow large. It doesn't explain why optimizing

u

makes

v

grow worse in an absolute sense. That is, the optimizer's curse provides motivation for why Goodhart's law occurs. It does not provide motivation for why the strong version of Goodhart's law occurs. (As I briefly discuss elsewhere in the post, one common causal mechanism for

v

growing worse is that it's expressivity is too closely matched to the complexity of the task it is performing. This is a very active research area though, and our understanding is both incomplete and actively changing.)
8 古哈特咒语的定义包括优化者咒语作为其因果机制。这就是“咒语”一词在其名称中的来源。如果目标

u

是目标目标

v

的不完善代理，优化者咒语解释了为什么优化

u

会找到异常好的

u

，并使

u

与

v

之间的差距变大。它并没有解释为什么优化

u

会使

v

在绝对意义上变得更糟。也就是说，优化者咒语提供了古哈特定律发生的动机。它并没有提供强版本古哈特定律发生的动机。（正如我在帖子中其他地方简要讨论的那样，一个常见的因果机制是

v

变得更糟，是因为它的表现能力与其执行的任务复杂性匹配得太紧密。这是一个非常活跃的研究领域，我们的理解既不完整也在不断变化。）

Thank you to Asako Miyakawa and Katherine Lee for providing feedback on earlier drafts of this post.
感谢宫川淳子和凯瑟琳·李对本文早期草稿所提供的反馈。

Overfitting and Goodhart's law过拟合与古德哈特法则

The strong version of Goodhart's law: as we become too efficient, the thing we care about grows worse强版的古德哈特法则：随着我们变得过于高效，我们所关心的事物反而变得更糟

Increasing efficiency and overfitting are happening everywhere效率提升和过拟合现象无处不在

How do we mitigate the problems caused by overfitting and the strong version of Goodhart's law?我们如何减轻过拟合以及古德哈特法则强版本带来的问题？

Parting thoughts 告别感想

Overfitting and Goodhart's law
过拟合与古德哈特法则

The strong version of Goodhart's law: as we become too efficient, the thing we care about grows worse
强版的古德哈特法则：随着我们变得过于高效，我们所关心的事物反而变得更糟

Increasing efficiency and overfitting are happening everywhere
效率提升和过拟合现象无处不在

How do we mitigate the problems caused by overfitting and the strong version of Goodhart's law?
我们如何减轻过拟合以及古德哈特法则强版本带来的问题？