这是用户在 2025-4-30 10:38 为 https://www.lesswrong.com/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Ten short guidelines for clear thinking and collaborative truth-seeking, followed by extensive discussion of what exactly they mean and why Duncan thinks they're an important default guideline.
十条清晰思考和协作寻求真理的简短指南,然后详细讨论它们的确切含义以及邓肯为什么认为它们是重要的默认准则。

29Elizabeth  伊丽莎白
I wish this had been called "Duncan's Guidelines for Discourse" or something like that. I like most of the guidelines given, but they're not consensus. And while I support Duncan's right to block people from his posts (and agree with him far on discourse norms far more than with the people he blocked), it means that people who disagree with him on the rules can't make their case in the comments. That feels like an unbalanced playing field to me. 
我希望这能被叫做"邓肯的论述指南"或类似的名字。我喜欢给出的大部分准则,但这些并非共识。虽然我支持邓肯在他的帖子中屏蔽他人的权利(并且在论述规范上更多地同意他而非被屏蔽的人),但这意味着不同意他规则的人无法在评论中陈述自己的观点。这对我来说感觉像是一个不平衡的战场。
18Screwtape  斯克鲁塔普
I think this, or something like this, should be in a place of prominence on LessWrong. The Best Of collection might not be the place, but it's the place I can vote on, so I'd like to vote for it here. I used "or something like this" above intentionally. The format of this post — an introduction of why these guidelines exist, short one or two sentence explanations of the guideline, and then expanded explanations with "ways you might feel when you're about to break the X Guideline" — is excellent. It turns each guideline into a mini-lesson, which can be broken out and referenced independently. The introduction gives context for them all to hang together. The format is A+, fighting for S tier. Why "something like this" instead of "this, exactly this" then? Each individual guideline is good, but they don't feel like they're the only set. I can imagine swapping basically any of them other than 0 and 1 out for something different and having something I liked just as much. I still look at 5 ("Aim for convergence on truth, and behave as if your interlocutors are also aiming for convergence on truth") and internally wince. I imagine lots of people read it, mostly agreed with it, but wanted to replace or quibble with one or two of the guidelines, and from reading the comments there wasn't a consensus on which line was out of place.  That seems like a good sign.  It's interesting to me to contrast it with Elements Of Rationalist Discourse. Elements doesn't resonate as much with me, and while some of that is Elements is not laid out as cleanly I also don't agree with the list the same way. And yet, Elements was also upvoted highly. The people yearn for guidelines, and there wasn't a clear favourite. Someday I might try my own hand at the genre, and I still consider myself to owe an expansion on my issues with 5. I'm voting for this to be in the Best Of LessWrong collection. If there was a process to vote to make this or at least the introduction and Guidelines, In Brief in
我认为这篇文章,或者类似的文章,应该在 LessWrong 上处于突出位置。最佳文章合集可能不是最佳位置,但这是我可以投票的地方,所以我想在这里投票。我在上文中有意使用了"或类似的内容"。这篇文章的格式——解释这些准则存在的原因的介绍,准则的简短一两句话解释,然后是"当你即将违反 X 准则时可能会感受到的方式"的详细解释——非常出色。这使每个准则都成为一个微型课程,可以独立提取和引用。介绍为它们提供了整体上下文。这种格式是 A+,争取达到 S 级。为什么是"类似的内容"而不是"正是这个"?每个单独的准则都很好,但它们并不感觉是唯一的集合。我可以想象除了第 0 条和第 1 条外,基本上可以用其他内容替换任何准则,并且仍然喜欢。我仍然看着第 5 条("追求真理的收敛,并表现得好像你的对话者也在追求真理的收敛")时内心会感到不适。我想象很多人阅读后基本同意,但希望替换或挑剔一两个准则,从评论中可以看出没有就哪一条不恰当达成共识。这看起来是个好迹象。我很有兴趣将其与《理性主义话语要素》进行对比。《要素》对我的共鸣不够,部分原因是排版不够清晰,另一部分是我不完全认同其列表。然而,《要素》也获得了高票。人们渴望指导方针,但没有明确的最爱。将来某天我可能会尝试写这类文章,我仍然认为我欠下一篇关于我对第 5 条意见的详细阐述。我正在投票将此文章纳入 LessWrong 最佳文章合集。如果有一个投票过程可以使这篇文章或至少其介绍和"简要准则"
180Why is o1 so deceptive?
为什么 o1 如此具有欺骗性?
abramdemski, Sahil  亚伯拉姆·德姆斯基,萨希尔
24
817+ tractable directions in AI control
人工智能控制的 7+可处理方向
Julian Stastny, ryan_greenblatt
朱利安·斯塔斯尼,瑞安·格林布拉特


gwern*Ω13313

The Meta-Lesswrong Doomsday Argument (MMLWDA) predicts long AI timelines and that we can relax:

LessWrong was founded in 2009 (16 years ago), and there have been 44 mentions of the 'Doomsday argument' prior to this one, and it is now 2025, at 2.75 mentions per year.

By the Doomsday argument, we medianly-expect mentions to stop in: after 44 additional mentions over 16 additional years or in 2041. (And our 95% CI on that 44 would then be +1 mention to +1,1760 mentions, corresponding to late-2027 AD to 2665 AD.)

By a curious coincidence, double-checking to see if really no one had made a meta-DA before, it turns out that Alexey Turchin has made a meta-DA as well about 7 years ago, calculating that

If we assume 1993 as the beginning of a large DA-Doomers reference class, and it is 2018 now (at the moment of writing this text), the age of the DA-Doomers class is 25 years. Then, with 50% probability, the reference class of DA-Doomers will disappear in 2043, according to Gott’s equation! Interestingly, the dates around 2030–2050 appear in many different predictions of the singularity or the end of the world (Korotayev 2018; Turchin & Denkenberger 2018b; Kurzweil 2006).

His estimate of 2043 is surprisingly close to 2041.

We offer no explanation as to why this numerical consilience of meta-DA calculations has happened; we attribute their success, as all else, to divine benevolence.

1Robert Cousineau
I think taking in to account the Meta-Meta-LessWrong Doomsday Analysis (MMLWDA) reveals an even deeper truth: your calculation fails to account for the exponential memetic acceleration of doomsday-reference-self-reference. You've correctly considered that before your post, there were 44 mentions in 16 years (2.75/year); however, now you've created the MLWDA argument - noticeably more meta than previous mentions. This meta-ness increase is quite likely to trigger cascading self-referential posts (including this one). The correct formulation should incorporate the Meta-Meta-Carcinization Principle (MMCP): all online discourse eventually evolves into recursive self-reference at an accelerating rate. Given my understanding of historical precedent from similar rat and rat adjacent memes, I'd estimate approximately 12-15 direct meta-responses to your post within the next month alone, and see no reason to expect the exponential to turn sigmoid in timescales that render my below argument unlikely.   This actually implies a much sooner endpoint distribution - the discourse will become sufficiently meta by approximately November 2027 that it will collapse into a singularity of self-reference, rendering further mentions both impossible and unnecessary.
5gwern
However, you can't use this argument because unlike the MLWDA, where I am arguably a random observer of LW DA instances (the thought was provoked by Michael Nielsen linking to Shalizi's notes on Mesopotamia and me thinking that the temporal distances are much less impressive if you think of them in terms of 'nth human to live', which immediately reminded me of DA and made me wonder if anyone had done a 'meta-DA', and LW simply happened to be the most convenient corpus I knew of to accurately quantify '# of mentions' as tools like Google Scholar or Google N-Grams have a lot of issues - I have otherwise never taken much of an interest in the DA and AFAIK there have been no major developments recently), you are in a temporally privileged position with the MMLWDA, inasmuch as you are the first responder to my MLWDA right now, directly building on it in a non-randomly-chosen-in-time fashion. Thus, you have to appeal purely to non-DA grounds like making a parametric assumption or bringing in informative priors from 'similar rat and rat adjacent memes', and that's not a proper MMLWDA. That's just a regular prediction. Turchin actually notes this issue in his paper, in the context of, of course, the DA and why the inventer Carter could not make a Meta-DA (but he and I could):
gwern*Ω13313
2
The Meta-Lesswrong Doomsday Argument (MMLWDA) predicts long AI timelines and that we can relax: LessWrong was founded in 2009 (16 years ago), and there have been 44 mentions of the 'Doomsday argument' prior to this one, and it is now 2025, at 2.75 mentions per year. By the Doomsday argument, we medianly-expect mentions to stop in: after 44 additional mentions over 16 additional years or in 2041. (And our 95% CI on that 44 would then be +1 mention to +1,1760 mentions, corresponding to late-2027 AD to 2665 AD.) By a curious coincidence, double-checking to see if really no one had made a meta-DA before, it turns out that Alexey Turchin has made a meta-DA as well about 7 years ago, calculating that His estimate of 2043 is surprisingly close to 2041. We offer no explanation as to why this numerical consilience of meta-DA calculations has happened; we attribute their success, as all else, to divine benevolence.
元认知 LessWrong 末日论(MMLWDA)预测人工智能发展时间线较长,我们可以放松:LessWrong 于 2009 年成立(16 年前),在此之前已有 44 次提及"末日论",现在是 2025 年,平均每年 2.75 次提及。根据末日论,我们预计提及将在以下时间停止:经过 44 次额外提及,即 16 年后的 2041 年。(并且我们对这 44 次的 95%置信区间为增加 1 至 1,760 次提及,对应从 2027 年底至 2665 年。)通过一个奇妙的巧合,在仔细检查是否真的没有人之前做过这种元末日论时,发现亚历克谢·图尔钦大约 7 年前也做过一个元末日论,他估计的 2043 年令人惊讶地接近 2041 年。我们对这种元末日论计算的数值一致性不做解释;我们将他们的成功归因于神圣的仁慈。

No question that e.g. o3 lying and cheating is bad, but I’m confused why everyone is calling it “reward hacking”.

Let’s define “reward hacking” (a.k.a. specification gaming) as “getting a high RL reward via strategies that were not desired by whoever set up the RL reward”. Right?

If so, well, all these examples on X etc. are from deployment, not training. And there’s no RL reward at all in deployment. (Fine print: Maybe there are occasional A/B tests or thumbs-up/down ratings in deployment, but I don’t think those have anything to do with why o3 lies and cheats.) So that’s the first problem.

Now, it’s possible that, during o3’s RL CoT post-training, it got certain questions correct by lying and cheating. If so, that would indeed be reward hacking. But we don’t know if that happened at all. Another possibility is: OpenAI used a cheating-proof CoT-post-training process for o3, and this training process pushed it in the direction of ruthless consequentialism, which in turn (mis)generalized into lying and cheating in deployment. Again, the end-result is still bad, but it’s not “reward hacking”.

Separately, sycophancy is not “reward hacking”, even if it came from RL on A/B tests, unless the average user doesn’t like sycophancy. But I’d guess that the average user does like quite high levels of sycophancy. (Remember, the average user is some random high school jock.)

Am I misunderstanding something? Or are people just mixing up “reward hacking” with “ruthless consequentialism”, since they have the same vibe / mental image?

I agree people often aren't careful about this.

Anthropic says

During our evaluations we noticed that Claude 3.7 Sonnet occasionally resorts to special-casing in order to pass test cases in agentic coding environments . . . . This undesirable special-casing behavior emerged as a result of "reward hacking" during reinforcement learning training.

Similarly OpenAI suggests that cheating behavior is due to RL.

5Steven Byrnes
Thanks! I’m now much more sympathetic to a claim like “the reason that o3 lies and cheats is (perhaps) because some reward-hacking happened during its RL post-training”. But I still think it’s wrong for a customer to say “Hey I gave o3 this programming problem, and it reward-hacked by editing the unit tests.”
4Cole Wyeth
Yes, you’re technically right. 

I think that using 'reward hacking' and 'specification gaming' as synonyms is a significant part of the problem. I'd argue that for LLMs, which can learn task specifications not only through RL but also through prompting, it makes more sense to keep those concepts separate, defining them as follows:

  • Reward hacking—getting a high RL reward via strategies that were not desired by whoever set up the RL reward.
  • Specification gaming—behaving in a way that satisfies the literal specification of an objective without achieving the outcome intended by whoever specifi
... (read more)
4cubefox
There was a recent in-depth post on reward hacking by @Kei (e.g. referencing this) who might have to say more about this question. Though I also wanted to just add a quick comment about this part: It is not quite the same, but something that could partly explain lying is if models get the same amount of reward during training, e.g. 0, for a "wrong" solution as they get for saying something like "I don't know". Which would then encourage wrong solutions insofar as they at least have a potential of getting reward occasionally when the model gets the expected answer "by accident" (for the wrong reasons). At least something like that seems to be suggested by this: Source: Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
2faul_sname
So I think what's going on with o3 isn't quite standard-issue specification gaming either. It feels like, when I use it, if I ever accidentally say something which pattern-matches something which would be said in an eval, o3 exhibits the behavior of trying to figure out what metric it could be evaluated by in this context and how to hack that metric. This happens even if the pattern is shallow and we're clearly not in an eval context, I'll try to see if I can get a repro case which doesn't have confidential info.
2Kei
It's pretty common for people to use the terms "reward hacking" and "specification gaming" to refer to undesired behaviors that score highly as per an evaluation or a specification of an objective, regardless of whether that evaluation/specification occurs during RL training. I think this is especially common when there is some plausible argument that the evaluation is the type of evaluation that could appear during RL training, even if it doesn't actually appear there in practice. Some examples of this: * OpenAI described o1-preview succeeding at a CTF task in an undesired way as reward hacking. * Anthropic described Claude 3.7 Sonnet giving an incorrect answer aligned with a validation function in a CoT faithfulness eval as reward hacking. They also used the term when describing the rates of models taking certain misaligned specification-matching behaviors during an evaluation after being fine-tuned on docs describing that Claude does or does not like to reward hack. * This relatively early DeepMind post on specification gaming and the blog post from Victoria Krakovna that it came from (which might be the earliest use of the term specification gaming?) also gives a definition consistent with this.  I think the literal definitions of the words in "specification gaming" align with this definition (although interestingly not the words in "reward hacking"). The specification can be operationalized as a reward function in RL training, as an evaluation function or even via a prompt. I also think it's useful to have a term that describes this kind of behavior independent of whether or not it occurs in an RL setting. Maybe this should be reward hacking and specification gaming. Perhaps as Rauno Arike suggests it is best for this term to be specification gaming, and for reward hacking to exclusively refer to this behavior when it occurs during RL training. Or maybe due to the confusion it should be a whole new term entirely. (I'm not sure that the term "ruthless cons
No question that e.g. o3 lying and cheating is bad, but I’m confused why everyone is calling it “reward hacking”. Let’s define “reward hacking” (a.k.a. specification gaming) as “getting a high RL reward via strategies that were not desired by whoever set up the RL reward”. Right? If so, well, all these examples on X etc. are from deployment, not training. And there’s no RL reward at all in deployment. (Fine print: Maybe there are occasional A/B tests or thumbs-up/down ratings in deployment, but I don’t think those have anything to do with why o3 lies and cheats.) So that’s the first problem. Now, it’s possible that, during o3’s RL CoT post-training, it got certain questions correct by lying and cheating. If so, that would indeed be reward hacking. But we don’t know if that happened at all. Another possibility is: OpenAI used a cheating-proof CoT-post-training process for o3, and this training process pushed it in the direction of ruthless consequentialism, which in turn (mis)generalized into lying and cheating in deployment. Again, the end-result is still bad, but it’s not “reward hacking”. Separately, sycophancy is not “reward hacking”, even if it came from RL on A/B tests, unless the average user doesn’t like sycophancy. But I’d guess that the average user does like quite high levels of sycophancy. (Remember, the average user is some random high school jock.) Am I misunderstanding something? Or are people just mixing up “reward hacking” with “ruthless consequentialism”, since they have the same vibe / mental image?
毫无疑问,说谎和欺骗是不好的,但我很困惑为什么每个人都称之为"奖励黑客"。让我们将"奖励黑客"(又称规范游戏)定义为"通过设置奖励者不希望的策略获得高强化学习奖励"。对吗?如果是这样,那么 X 等平台上的所有这些示例都来自部署,而不是训练。并且在部署中根本没有强化学习奖励。(细则:可能在部署中有偶尔的 A/B 测试或点赞/点踩评级,但我不认为这些与 o3 说谎和欺骗有什么关系。)这是第一个问题。现在,有可能在 o3 的强化学习思维链后训练过程中,它通过说谎和欺骗来正确回答某些问题。如果是这样,那确实是奖励黑客。但我们并不知道是否发生过这种情况。另一种可能性是:OpenAI 为 o3 使用了防作弊的思维链后训练流程,这个训练过程将其推向无情的后果主义,进而(错误地)推广到部署中的说谎和欺骗。再次强调,最终结果仍然很糟,但这不是"奖励黑客"。另外,阿谀奉承不是"奖励黑客",即使它来自 A/B 测试的强化学习,除非普通用户不喜欢阿谀奉承。但我猜测普通用户确实喜欢相当高程度的阿谀奉承。(记住,普通用户是某个随机的高中运动员。)我是不是误解了什么?还是人们只是将"奖励黑客"与"无情的后果主义"混淆了,因为它们有相同的感觉/心理形象?

I have serious, serious issues with avoidance. I would like some advice on how to improve, as I suspect it is significantly holding me back.

Some examples of what I mean

  • I will not respond to an email or an urgent letter for weeks at a time, even while it causes me serious anxiety
  • I will procrastinate starting work in the morning, sometimes leading to me doing nothing at all by the afternoon
  • I will avoid looking for jobs or other opportunities, I have very strong avoidance here, but I'm not sure why
  • I will make excuses to avoid meetings and social situations very often
  • I will (unconsciously) avoid running experiments that might falsify a hypothesis I am attached to. I have only realised this very recently, and am consciously trying to do better, but it is somewhat shocking to me that my avoidance patterns even manifest here.
4Garrett Baker
I recommend you read at least the first chapter of Getting Things Done, and do the corresponding exercises. In particular, this one, which he uses to provide evidence his model of productivity is correct
4p.b.
What helps me to overcome the initial hurdle to start doing work in the morning:  1. Write a list of the stuff you have to do the next day 2. Make it very fine-grained with single tasks (especially the first few) being basically no effort. 3. Tick them off one by one Also:  1. Tell people what you have to do and when you are going to do it and that you have done it. Like, a colleague, or your team, or your boss. 2. Do stuff with other people. Either actually together, like pair programming, or closely intertwined.  I think it also helps to take something you are good at and feel good about and in that context take responsibility for something and/or interact with/present to people. Only this kind of social success will build the confidence to overcome social anxiety, but directly trying to do the social stuff you feel worst about usually backfires (at least for me). 
3Seth Herd
Read about Ugh fields on LW Edit: this doesn't include practical advice, but a theoretical understanding of the issues at play is often helpful in implementing practical strategies
2cubefox
See here. (Perhaps also relevant: PDA)
2trevor
VIsualize yourself doing the thing until you do it. Note that this comes with substantial risk towards making you avoidant/averse to visualizing yourself doing the thing until you do it; this is a recursive procedurally generated process and you should expect to need to keep on your toes in order to succeed. Aversion factoring is a good resource to start with, and Godel Escher and Bach is a good resource for appreciating the complexity required for maintenance and the inadequacy of simple strategies.
1jam_brand
I've had similar issues downstream of what I'd somehow failed to realize was a clinically-significant level of anxiety, so that's something to maybe consider checking into.
1p
If you haven't already, talk to a guy! (typically a therapist but doesn't have to be) I have something like this but for decisions, where I will avoid making decisions for mysterious reasons (we figured out it's because I can't be sure it'd be pareto optimal, among other reasons). I now notice more often when I'm doing this, and correct more gracefully.
1shawnghu
1. If this would not obviously make things worse, be more socially connected with people who have expectations of you; not necessarily friends but possibly colleagues or people who simply assume you should be working at times and get feedback about that in a natural way. It's possible that the prospect of this is anxiety-inducing and would be awful but that it would not actually be very awful. 2. Recognize that you don't need to do most things perfectly or even close to it, and as a corollary, you don't need to be particularly ready to handle tasks even if they are important. You can handle an email or an urgent letter without priming yourself or being in the right state of mind. The vast majority of things are this way. 3. Sit in the start position of your task, as best as you can operationalize that (e.g, navigate to the email and open it, or hit the reply button and sit in front of it), for one minute, without taking your attention off of the task. Progress the amount of time upwards as necessary/possible. (One possible success-mode from doing this is that you get bored of being in this position or you become aware that you're tired of the thing not being done. (You would hope your general anxiety about the task in day-to-day life would achieve this for you, but it's not mechanically optimized enough to.) Another possible success-mode is that the immediate feelings you have about doing the task subside.) 4. Beta-blockers.
1Sergii
I have similar issues, severity varies over time. If I am in a bad place, things that help best: - taking care of mental health. I do CBT when i'm in worse shape, and take SSRIs. YMMV. both getting dianosed and treated are important. this also includes regular exercise and good sleep. what you have described might be (although does not have to be) related to depression, anxiety, attention disorders. - setting a timer for a short time, can be as short as 1min, and doing one of the avoided tasks for just that 1 minute. it kind if "breaks the spell" for me - journaling, helps to "debug" the problems, and in most cases leads to wring down plans / intervations / resolutuons
sam338
9
I have serious, serious issues with avoidance. I would like some advice on how to improve, as I suspect it is significantly holding me back. Some examples of what I mean * I will not respond to an email or an urgent letter for weeks at a time, even while it causes me serious anxiety * I will procrastinate starting work in the morning, sometimes leading to me doing nothing at all by the afternoon * I will avoid looking for jobs or other opportunities, I have very strong avoidance here, but I'm not sure why * I will make excuses to avoid meetings and social situations very often * I will (unconsciously) avoid running experiments that might falsify a hypothesis I am attached to. I have only realised this very recently, and am consciously trying to do better, but it is somewhat shocking to me that my avoidance patterns even manifest here.
我对回避行为有严重、严重的问题。我想寻求一些改进的建议,因为我怀疑这确实在严重地阻碍我的发展。以下是我所指的一些例子: * 即使这会让我感到严重的焦虑,我也会在几周内不回复电子邮件或紧急信件 * 我会拖延早晨开始工作,有时到了下午甚至什么都没做 * 我会回避寻找工作或其他机会,在这方面我的回避非常强烈,但我不确定原因 * 我经常找借口回避会议和社交场合 * 我(无意识地)会回避可能会推翻我所支持的假设的实验。我直到最近才意识到这一点,并且正在有意识地尝试改进,但我对自己的回避模式竟然会体现在这里感到非常震惊。

The speed of scaling pretraining will go down ~3x in 2027-2029, reducing probability of crossing transformative capability thresholds per unit of time after that point, if they'd not been crossed yet by then.

GPT-4 was trained in 2022 at ~2e25 FLOPs, Grok-3 and GPT-4.5 were trained in 2024 at ~3e26 FLOPs (or twice that in FP8) using ~100K H100s training systems (which cost ~$4-5bn to build). In 2026, Abilene site of Crusoe/Stargate/OpenAI will have 400K-500K Blackwell chips in NVL72 racks (which cost ~$22-35bn to build), enough to train a ~4e27 FLOPs model. Thus recently there is a 2-year ~6x increase in cost for a frontier training system and a 2-year ~14x increase in compute. But for 2028 this would mean a $150bn training system (which is a lot, so only borderline plausible), and then $900bn in 2030. At that point AI companies would need to either somehow figure out how to pool resources, or pretraining will stop scaling before 2030 (assuming AI still doesn't hit a transformative commercial success).

If funding stops increasing, what we are left with is the increase in price performance of ~2.2x every 2 years, which is ~3.3x slower than the 2-year ~14x at the current pace. (I'm estimating price performance for a whole datacenter or at least a rack, rather than only for chips.)

4ryan_greenblatt
We also hit limits on fab capacity without constructing a bunch more fabs around a similar time. ---------------------------------------- Price performance of 2.2x per year feels aggressive to me. The chip only trend is more like 1.35x / year from understanding. Do you think the ML chip trend is much faster than this? I don't see how you could have a 2.2x price drop per year longer term without chip price performance following as eventually chips will be the bottleneck even if other costs (e.g., interconnect, building datacenters) are dropping. Edit: this was 2.2x every 2 years, I was just confused.
6Vladimir_Nesov
If I'm reading the relevant post correctly, it's 1.35x FP32 FLOP/s per GPU per year (2x in 2.3 years), which is not price-performance[1]. The latter is estimated to be 1.4x FP32 FLOP/s per inflation-adjusted dollar (2x in 2.1 years). It's 2.2x per 2 years, which is 1.5x per year, though that's still more than 1.4x per year. I'm guessing packaging is part of this, and also Nvidia is still charging a giant margin for the chips, so the chip manufacturing cost is far from dominating the all-in datacenter cost. This might be enough to sustain 1.5x per year a bit beyond 2030 (the discrepancy of 1.5/1.4 only reaches 2x after 10 years). But even if we do get back to 1.4x/year, that only turns the 3.3x reduction in speed of pretraining scaling into 3.9x reduction in speed, so the point stands. ---------------------------------------- 1. Incidentally, the word "GPU" has recently lost all meaning, since Nvidia started variably referring to either packages with multiple compute dies in them as GPUs (in Blackwell), or to individual compute dies (in Rubin). Packaging will be breaking trends for FLOP/s per package, but also FLOP/s per compute die, for example Rubin seems to derive significant advantage per compute die from introducing separate smaller I/O dies, so that the reticle sized compute dies become more specialized and their performance when considered in isolation might improve above trend. ↩︎
3ryan_greenblatt
Oh oops, I just misread you, didn't realize you said 2.2x every 2 years, nvm.
The speed of scaling pretraining will go down ~3x in 2027-2029, reducing probability of crossing transformative capability thresholds per unit of time after that point, if they'd not been crossed yet by then. GPT-4 was trained in 2022 at ~2e25 FLOPs, Grok-3 and GPT-4.5 were trained in 2024 at ~3e26 FLOPs (or twice that in FP8) using ~100K H100s training systems (which cost ~$4-5bn to build). In 2026, Abilene site of Crusoe/Stargate/OpenAI will have 400K-500K Blackwell chips in NVL72 racks (which cost ~$22-35bn to build), enough to train a ~4e27 FLOPs model. Thus recently there is a 2-year ~6x increase in cost for a frontier training system and a 2-year ~14x increase in compute. But for 2028 this would mean a $150bn training system (which is a lot, so only borderline plausible), and then $900bn in 2030. At that point AI companies would need to either somehow figure out how to pool resources, or pretraining will stop scaling before 2030 (assuming AI still doesn't hit a transformative commercial success). If funding stops increasing, what we are left with is the increase in price performance of ~2.2x every 2 years, which is ~3.3x slower than the 2-year ~14x at the current pace. (I'm estimating price performance for a whole datacenter or at least a rack, rather than only for chips.)
从预训练规模的角度来看,2027-2029 年期间扩展速度将下降约 3 倍,如果到那时还未跨越变革性能力阈值,则每单位时间跨越该阈值的可能性会降低。GPT-4 于 2022 年以约 2e25 FLOPs 训练,Grok-3 和 GPT-4.5 于 2024 年以约 3e26 FLOPs(或在 FP8 中为两倍)训练,使用约 10 万个 H100 训练系统(建造成本约 40-50 亿美元)。到 2026 年,Crusoe/Stargate/OpenAI 的阿比林站点将拥有 40-50 万个 Blackwell 芯片,位于 NVL72 机架中(建造成本约 220-350 亿美元),足以训练约 4e27 FLOPs 的模型。最近,前沿训练系统的成本在两年内增加了约 6 倍,计算能力增加了约 14 倍。但到 2028 年,这可能意味着 1500 亿美元的训练系统(这是很大的投资,因此仅勉强可行),到 2030 年则可能达到 9000 亿美元。到那时,AI 公司将不得不想办法汇集资源,否则预训练规模将在 2030 年前停止扩展(假设 AI 尚未取得变革性商业成功)。如果资金停止增长,我们剩下的就是每两年价格性能提高约 2.2 倍,这比当前步伐的每两年 14 倍慢约 3.3 倍。(我是在估算整个数据中心或至少一个机架的价格性能,而不仅仅是芯片。)

there's an analogy between the zurich r/changemyview curse of evals and the metr/epoch curse of evals. You do this dubiously ethical (according to more US-pilled IRBs or according to more paranoid/pure AI safety advocates) measuring/elicitation project because you might think the world deserves to know. But you had to do dubiously ethical experimentation on unconsenting reddizens / help labs improve capabilities in order to get there--- but the catch is, you only come out net positive if the world chooses to act on this information

there's an analogy between the zurich r/changemyview curse of evals and the metr/epoch curse of evals. You do this dubiously ethical (according to more US-pilled IRBs or according to more paranoid/pure AI safety advocates) measuring/elicitation project because you might think the world deserves to know. But you had to do dubiously ethical experimentation on unconsenting reddizens / help labs improve capabilities in order to get there--- but the catch is, you only come out net positive if the world chooses to act on this information
苏黎世 r/changemyview 评估的诅咒与 metr/epoch 评估的诅咒之间存在类比。你进行这种在更美国化的伦理审查委员会或更偏执/纯粹的 AI 安全倡导者看来可疑的测量/诱导项目,是因为你可能认为世界应该知道真相。但为了获得这些信息,你不得不对未经同意的 Reddit 用户进行有争议的实验,或帮助实验室改进能力——问题是,只有当世界选择使用这些信息时,你才能获得净正面收益。

Popular Comments  热门评论

I think your discussion (and Epoch's discussion) of the CES model is confused as you aren't taking into account the possibility that we're already bottlenecking on compute or labor. That is, I think you're making some assumption about the current marginal returns which is non-obvious and, more strongly, would be an astonishing coincidence given that compute is scaling much faster than labor. In particular, consider a hypothetical alternative world where they have the same amount of compute, but there is only 1 person (Bob) working on AI and this 1 person is as capable as the median AI company employee and also thinks 10x slower. In this alternative world they could also say "Aha, you see because ρ≈0.4, even if we had billions of superintelligences running billions of times faster than Bob, AI progress would only go up to around 4x faster!" Of course, this view is absurd because we're clearly operating >>4x faster than Bob. So, you need to make some assumptions about the initial conditions. ---------------------------------------- This perspective implies an (IMO) even more damning issue with this exact modeling: the CES model is symmetric, so it also implies that additional compute (without labor) can only speed you up so much. I think the argument I'm about to explain strongly supports a lower value of ρ or some different functional form. Consider another hypothetical world where the only compute they have is some guy with an abacus, but AI companies have the same employees they do now. In this alternative world, you could also have just as easily said "Aha, you see because ρ≈0.4, even if we had GPUs that could do 1e15 FLOP/s (far faster than our current rate of 1e-1 fp8 FLOP/s), AI progress would only go around 4x faster!" Further, the availability of compute for AI experiments has varied by around 8 orders of magnitude over the last 13 years! (AlexNet was 13 years ago.) The equivalent (parallel) human labor focused on frontier AI R&D has varied by more like 3 or maybe 4 orders of magnitude. (And the effective quality adjusted serial labor, taking into account parallelization penalities etc, has varied by less than this, maybe by more like 2 orders of magnitude!) Ok, but can we recover low-substitution CES? I think the only maybe consistent recovery (which doesn't depend on insane coincidences about compute vs labor returns) would imply that compute was the bottleneck (back in AlexNet days) such that scaling up labor at the time wouldn't yield ~any progress. Hmm, this doesn't seem quite right. Further, insofar as you think scaling up just labor when we had 100x less compute (3-4 years ago) would have still been able to yield some serious returns (seems obviously true to me...), then a low-substitution CES model would naively imply we have a sort of compute overhang where were we can speed things up by >100x using more labor (after all, we've now added a bunch of compute, so time for more labor). Minimally, the CES view predicts that AI companies should be spending less and less of their budget on compute as GPUs are getting cheaper. (Which seems very false.) ---------------------------------------- Ok, but can we recover a view sort of like the low-substitution CES view where speeding up our current labor force by an arbitrary amount (also implying we could only use our best employees etc) would only yield ~10x faster progress? I think this view might be recoverable with some sort of non-symmetric model where we assume that labor can't be the bottleneck in some wide regime, but compute can be the bottleneck. (As in, you can always get faster progress by adding more compute, but the multiplier on top of this from adding labor caps out at some point which mostly doesn't depend on how much compute you have. E.g., maybe this could be because at some point you just run bigger experiments with the same amount of labor and doing a larger numbers of smaller experiment is always worse and you can't possibly design the experiment better. I think this sort of model is somewhat plausible.) This model does make a somewhat crazy prediction where it implies that if you scale up compute and labor exactly in parallel, eventually further labor has no value. (I suppose this could be true, but seems a bit wild.) ---------------------------------------- Overall, I'm currently quite skeptical that arbitrary improvements in labor yield only small increases in the speed of progress. (E.g., a upper limit of 10x faster progress.) As far as I can tell, this view either privileges exactly the level of human researchers at AI companies or implies that using only a smaller number of weaker and slower researchers wouldn't alter the rate of progress that much. In particular, consider a hypothetical AI company with the same resources as OpenAI except that they only employ aliens whose brains work 10x slower and for which the best researcher is roughly as good as OpenAI's median technical employee. I think such an AI company would be much slower than OpenAI, maybe 10x slower (part from just lower serial speed and part from reduced capabilities). If you think such an AI company would be 10x slower, then by symmetry you should probably think that an AI company with 10x faster employees who are all as good as the best researchers should perhaps be 10x faster or more.[1] It would be surprising if the returns stopped at exactly the level of OpenAI researchers. And the same reasoning makes 100x speed ups once you have superhuman capabilities at massive serial speed seem very plausible. ---------------------------------------- 1. I edited this to be a hopefully more clear description. ↩︎
我认为你对 CES 模型的讨论(和 Epoch 的讨论)存在混淆,因为你没有考虑到我们可能已经在计算能力或人力上达到了瓶颈。也就是说,我认为你对当前边际收益做了一些不明显的假设,更重要的是,考虑到计算能力的扩展速度远远快于劳动力,这种假设会是一个惊人的巧合。特别是,设想一个假设的替代世界,那里的计算能力相同,但只有 1 个人(Bob)在从事人工智能研究,这 1 个人的能力相当于人工智能公司的中位数员工,而且思考速度是 10 倍慢。在这个替代世界中,他们也可能会说:"啊哈,你看,因为ρ≈0.4,即使我们有数十亿超级智能以比 Bob 快数十亿倍的速度运行,人工智能进步也只会提高到大约 4 倍!"当然,这种观点是荒谬的,因为我们显然比 Bob 快得多。所以,你需要对初始条件做一些假设。---------------------------------------- 这种观点意味着(在我看来)这种建模存在更为严重的问题:CES 模型是对称的,所以它也暗示额外的计算能力(没有劳动力)只能在一定程度上加快速度。我认为我将要解释的论点强有力地支持了ρ的较低值或某种不同的函数形式。考虑另一个假设的世界,他们仅有一个使用算盘的人,但人工智能公司拥有现在同样数量的员工。在这个替代世界中,你同样可以说:"啊哈,你看,因为ρ≈0.4,即使我们拥有能进行 1e15 FLOP/s 的 GPU(远远快于我们目前 1e-1 fp8 FLOP/s 的速率),人工智能进步也只会提高大约 4 倍!"此外,过去 13 年中,用于人工智能实验的计算能力已经变化了大约 8 个数量级! (AlexNet 距今已有 13 年。)在前沿人工智能研发领域,等效(并行)的人力投入大约波动了 3 或 4 个数量级。(考虑并行处罚等因素的有效质量调整后的连续劳动力,变化甚至小于这个范围,可能只有 2 个数量级!)好吧,但我们能恢复低替代性的 CES(持续弹性替代)吗?我认为唯一可能一致的恢复方法(不依赖于计算与劳动力回报的疯狂巧合)将意味着计算是当时的瓶颈(在 AlexNet 时代),以至于当时扩大劳动力规模几乎不会带来进展。嗯,这似乎不太对。此外,如果你认为在计算能力仅为现在 1/100 时(3-4 年前),仅仅扩大劳动力规模仍然可以带来一些 serious 回报(对我来说这显然是真的...),那么低替代性的 CES 模型会天真地暗示我们有一种计算能力过剩的状态,我们可以通过增加更多劳动力来加快速度(毕竟,我们现在已经增加了大量计算能力,是时候增加更多劳动力了)。至少,CES 视角预测人工智能公司应该在预算中花费越来越少的计算成本,因为 GPU 变得越来越便宜。(这看起来明显是错误的。)---------------------------------------- 好吧,但我们能否恢复类似低替代性 CES 视角的观点,即加速我们当前劳动力的任意程度(这也意味着我们只能使用最好的员工等)将只能带来大约 10 倍的进展?我认为这种观点可以通过某种非对称模型来恢复,在这个模型中,我们假设劳动力在某个广泛的领域不能成为瓶颈,但计算能力可以成为瓶颈。(即,通过增加计算能力总是可以获得更快的进展,但通过增加劳动力带来的乘数在某个点封顶,这主要不取决于你拥有多少计算能力。) 例如,也许这是因为在某个时候,你只是用相同的劳动力进行更大规模的实验,而进行大量小实验总是更糟,并且你无法设计出更好的实验。我认为这种模型是有一定可能性的。)这个模型确实做出了一个有点疯狂的预测,暗示如果你完全平行地扩大计算能力和劳动力,最终进一步的劳动力将没有价值。(我想这可能是真的,但看起来有点野。)----------------------------------------总的来说,我目前非常怀疑任意改善劳动力只会导致进步速度的小幅增加。(例如,进步速度最多提高 10 倍。)据我所知,这种观点要么特权化 AI 公司的人类研究人员的确切水平,要么暗示只使用较少数量的较弱和较慢的研究人员不会太大程度地改变进步速度。特别是,考虑一个假设的 AI 公司,其资源与 OpenAI 相同,但只雇佣大脑工作速度慢 10 倍的外星人,其最佳研究人员大致相当于 OpenAI 的中等技术员工。我认为这样的 AI 公司会比 OpenAI 慢得多,大约慢 10 倍(部分是由于较低的串行速度,部分是由于能力降低)。如果你认为这样的 AI 公司会慢 10 倍,那么通过对称性,你可能也应该认为一个拥有 10 倍更快员工且所有员工都和最好的研究人员一样优秀的 AI 公司可能会快 10 倍或更多。[1]如果收益恰好停在 OpenAI 研究人员的水平,那将非常令人惊讶。相同的推理使得一旦拥有大规模串行速度下的超人能力,100 倍的加速看起来非常可能。---------------------------------------- 1. 我编辑这段内容以使描述更加清晰。↩︎
How are these ideas different from "signal" and "noise"?
这些想法与"信号"和"噪声"有何不同?
Need "what is good" questions where humans can reliably check answers (theorems, or tractably checkable formalization challenges). My favorite threads I'd like to see boosted: Wentworth, Kosoy, Ngo, Leake, Byrnes, Demski, Eisenstat. * John Wentworth (natural latents, and whatever he's interested in right now) * Vanessa Kosoy and co (non monotonic IBP, superimitation, and W.S.I.I.R.N.) * Richard Ngo (scale free agency/"marriages?" curiosity, and W.H.I.I.R.N.) * Tamsin Leake (qaci as a maybe slightly more specific superimitation, less of W.S.I.I.R.N. but maybe some) * Steven Byrnes (I'm not up to date on, so just W.H.I.I.R.N.) * Sam Eisenstat (had a cool but hard to hear talk on wentworth-esque stuff at MAISU, W.H.I.I.R.N.) * Abram Demski (W.H.I.I.R.N.) Current models are like actors, you talk to the character. I hope nobody gets mislead catastrophically by thinking you can outsource a hard to check part of things.
需要那些人类可以可靠检查答案的"什么是好"的问题(定理或可追溯检查的形式化挑战)。我希望看到提升的最喜欢的线索:文特沃斯,科索伊,诺格,莱克,伯恩斯,德姆斯基,艾森斯塔特。* 约翰·文特沃斯(自然潜在性,以及他当前感兴趣的内容)* 范妮莎·科索伊及其团队(非单调性 IBP,超级模仿,和 W.S.I.I.R.N.)* 理查德·诺格(无尺度代理/"婚姻?"好奇心,和 W.H.I.I.R.N.)* 塔姆辛·莱克(qaci 作为可能稍微更具体的超级模仿,不太是 W.S.I.I.R.N.但可能有一些)* 史蒂文·伯恩斯(我对他的情况不太了解,所以仅 W.H.I.I.R.N.)* 山姆·艾森斯塔特(在 MAISU 的文特沃斯式演讲很酷但难以听清,W.H.I.I.R.N.)* 亚伯拉罕·德姆斯基(W.H.I.I.R.N.)当前的模型就像演员,你和这个角色交谈。希望没有人会被误导性地认为可以外包难以检查的部分。
Load More  加载更多

Recent Discussion  最近讨论

This post examines the virtues of hope, optimism, and trust. It is meant mostly as an exploration of what other people have learned about these virtues, rather than as me expressing my own opinions about them, though I’ve been selective about what I found interesting or credible, according to my own inclinations. I wrote this not as an expert on the topic, but as someone who wants to learn more about it. I hope it will be helpful to people who want to know more about these virtues and how to nurture them.
这篇文章探讨了希望、乐观和信任的美德。它主要是作为对其他人对这些美德的认识的探索,而不是表达我自己对它们的观点,尽管我根据自己的倾向选择了我认为有趣或可信的内容。我并不是以专家的身份写这篇文章,而是作为一个想要更多地了解这个主题的人。我希望这对想要更多了解这些美德以及如何培养它们的人有所帮助。

What are these virtues?  这些美德是什么?

These virtues have in common a sort of “look on the bright side” / “expect the best” approach to life. But there are a number of ways to interpret this, and if...
这些美德有一个共同点,那就是对生活持"看积极的一面"/"期待最好"的态度。但对此有多种解读方式,如果……

Not really. Robert Anton Wilson's description is more on-point:
不完全是。罗伯特·安东·威尔逊的描述更加切中要点:

Let me differentiate between scientific method and the neurology of the individual scientist. Scientific method has always depended on feedback [or flip-flopping as the Tsarists call it]; I therefore consider it the highest form of group intelligence thus far evolved on this backward planet. The individual scientist seems a different animal entirely. The ones I've met seem as passionate, and hence as egotistic and prejudiced, as painters, ballerinas or even, God save the mark, novelists. My hop
让我区分科学方法和科学家个人的神经学。科学方法一直依赖于反馈(或正如沙皇派所说的翻转);因此我认为这是迄今为止在这个落后的星球上所进化出的最高形式的群体智慧。科学家个人似乎是完全不同的动物。我遇到的那些看起来和画家、芭蕾舞者,甚至上帝保佑,小说家一样充满激情,因此也同样自负和偏见。我的跳

... (read more)  ...(阅读更多)

there's an analogy between the zurich r/changemyview curse of evals and the metr/epoch curse of evals. You do this dubiously ethical (according to more US-pilled IRBs or according to more paranoid/pure AI safety advocates) measuring/elicitation project because you might think the world deserves to know. But you had to do dubiously ethical experimentation on unconsenting reddizens / help labs improve capabilities in order to get there--- but the catch is, you only come out net positive if the world chooses to act on this information
瑞士的 r/changemyview 诅咒与 METR/epoch 评估诅咒之间存在一种类比。你可能会通过这种在伦理上可疑的测量/引诱项目(根据更多美国化的 IRB 或更多偏执/纯粹的 AI 安全倡导者的观点),认为世界应该了解这些信息。但你不得不对未经同意的 Reddit 用户进行可疑的伦理实验,或帮助实验室改进能力,以达到这个目的——关键在于,只有当世界选择对这些信息采取行动时,你才能实现净正面效果。

I’ve been thinking recently about what sets apart the people who’ve done the best work at Anthropic.
最近我一直在思考是什么让安特罗皮克公司最优秀的员工脱颖而出。

You might think that the main thing that makes people really effective at research or engineering is technical ability, and among the general population that’s true. Among people hired at Anthropic, though, we’ve restricted the range by screening for extremely high-percentile technical ability, so the remaining differences, while they still matter, aren’t quite as critical. Instead, people’s biggest bottleneck eventually becomes their ability to get leverage—i.e., to find and execute work that has a big impact-per-hour multiplier.
你可能认为技术能力是使人们在研究或工程方面真正高效的主要因素,对于普通人群来说确实如此。但在 Anthropic 雇佣的员工中,我们已经通过筛选极高百分位的技术能力来限制范围,所以剩下的差异,尽管仍然重要,但不太关键。相反,人们最大的瓶颈最终成为获得杠杆效应的能力——即找到并执行每小时影响力乘数很高的工作。

For example, here are some types of work at Anthropic that tend to have high impact-per-hour, or a high impact-per-hour ceiling when done well (of course this list is extremely non-exhaustive!):
例如,以下是 Anthropic 中一些倾向于具有高每小时影响力,或在做好时具有高每小时影响力上限的工作类型(当然,这个列表极不详尽!):

  • Improving tooling, documentation, or dev
    改进工具、文档或开发
...

and computer use started off as a fraction of one person’s time,
计算机使用最初只占用了一个人一部分时间,

Curious when this was. When was the earliest point that someone was working on computer use? When was the latest point that ONLY one person was?
很想知道这是在什么时候。最早有人开始研究计算机使用是在什么时候?最晚只有一个人在使用计算机的时间点是什么时候?

Suppose that I have some budget set aside for philanthropic funding, say $1,000, but I think there are a big returns to scale, so that it would be >1,000x better if I had $1,000,000.[1]
假设我为慈善资助预留了一些预算,比如 1,000 美元,但我认为规模效应很大,所以如果我有 1,000,000 美元,效果将会好过 1,000 倍。

What are my best options for some bets I can make to get some chance of turning that $1,000 into $1,000,000?
我有什么最佳选择来下注,以获得将 1,000 美元变成 1,000,000 美元的机会?

I imagine the bets are generally better the more positive their EV is (which in practice means the less negative the EV is), the easier they are to find (like if they're standardised), and the better their tax treatment is, especially if you repeat them multiple times (maybe they can be done by DAFs).
我想,赌注的好处通常与其期望值(EV)越正面越好(实际上意味着 EV 越不负面),越容易找到(比如它们是标准化的),以及税收处理越好,尤其是当你多次重复这些赌注(可能可以通过捐赠建议基金(DAFs)来完成)。

  1. ^

    NB: I don't believe this
    注意:我不相信这一点

4Answer by TsviBT  TsviBT 的回答
I think occassionally some lotteries are positive or neutral-ish EV, when the jackpots are really big (like >$1billion)? Not sure. You have to check the taxes and the payment schedules etc.
我觉得有时一些彩票在奖池非常大(比如超过 10 亿美元)时可能是正面或中性的期望价值?不太确定。你必须查看税收和支付计划等细节。

Hard to buy tickets at large scale often I think.
我认为大规模购买彩票往往很困难。

3harfe  哈尔菲
Note that GWWC is shutting down their donor lottery, among other things: https://forum.effectivealtruism.org/posts/f7yQFP3ZhtfDkD7pr/gwwc-is-retiring-10-initiatives
请注意,GWWC 正在关闭他们的捐赠彩票等一些项目:https://forum.effectivealtruism.org/posts/f7yQFP3ZhtfDkD7pr/gwwc-is-retiring-10-initiatives
1Answer by Knight Lee  Knight Lee 的回答
Given that you're not satisfied with the Donor Lottery and want to gamble against "non-philanthropic funds," then you probably want to pool your money with all other gambling philanthropists before making a single large bet. A single large bet prevents your winnings from averaging out against other philanthropists' losings (resulting in a worst outcome than just using the Donor Lottery). If there are many gambling philanthropists, you can't go to the casino, because they probably don't have enough money to multiply all your funds by 1000x. You have to make some kind of extreme bet on the stock market. If you want to ensure that you win the bet in at least one branch of the multiverse (haha), then you might roll a quantum dice and let it decide which stock you bet on (or against).
鉴于你对捐赠彩票不满意,并且想要与"非慈善资金"对赌,那么你可能希望在进行单一大额赌注之前,先将资金与所有其他赌博慈善家集中起来。单一大额赌注会阻止你的赢钱抵消其他慈善家的输钱(导致比使用捐赠彩票更糟的结果)。如果有许多赌博慈善家,你就无法去赌场,因为他们可能没有足够的钱将你的资金放大 1000 倍。你必须在股市上进行某种极端的赌注。如果你想确保在多重宇宙的至少一个分支中获胜(哈哈),那么你可以掷量子骰子,让它决定你押哪只股票(或做空哪只股票)。

This is a follow-up to last week's D&D.Sci scenario: if you intend to play that, and haven't done so yet, you should do so now before spoiling yourself.
这是上周 D&D.Sci 场景的后续:如果你打算玩这个场景,且尚未尝试,现在就应该先玩,以免剧透。

There is a web interactive here you can use to test your answer, and generation code available here if you're interested, or you can read on for the ruleset and scores.
这里有一个网页交互工具可以用来测试你的答案,如果你感兴趣,还可以查看生成代码,或者继续阅读规则和得分。

RULESET  规则

Goods  商品

Each good is assigned a value for tax purposes:
每种商品都被分配一个用于税务目的的价值:

Good  Value  价值
Cockatrice Eye  鸡尾兽之眼6gp  6 金币
Dragon Head  龙头14gp  14 金币
Lich Skull  巫妖头骨10gp  10 金币
Unicorn Horn  独角兽之角7gp  7 金币
Zombie Arm  僵尸手臂2gp

Tax Brackets  税率等级

Depending on the total value of all goods you have, you determine a tax bracket:
根据您拥有的所有商品的总价值,确定税率等级:

Total Value  总价值Tax Rate  税率
<30gp  <30 金币20%
30-59gp30%
60-99gp40%
100-299gp50%
300gp+[1]60%

Your taxes due are equal to your Tax Rate multiplied by the total value of your goods.

So if you have two Lich Skulls (20gp), your tax rate is 20% and you will owe 4gp of taxes.

If you have three Lich...

2abstractapplic  抽象应用
Reflections on my performance: There's an interesting sense in which we all failed this one. Most other players used AI to help them accomplish tasks they'd personally picked out; I eschewed AI altogether and constructed my model with brute force and elbow grease; after reaching a perfect solution, I finally went back and used AI correctly, by describing the problem on a high level (manually/meatbrainedly distilled from my initial observations) and asking the machine demiurge what approach would make most sense[1]. From this I learned about the fascinating concept of Symbolic Regression and some associated python libraries, which I eagerly anticipate using to (attempt to) steamroll similarly-shaped problems. (There's a more mundane sense in which I specifically failed this one, since even after building a perfect input-output relation and recognizing the two best archetypes as rebatemaxxing and corpsemaxxing, I still somehow fell at the last hurdle and failed to get a (locally-)optimal corpsemaxxing solution; if the system had followed the original plan, I'd be down a silver coin and up a silver medal. Fortunately for my character's fortunes and fortune, Fortune chose to smile.) Reflections on the challenge: A straightforward scenario, but timed and executed flawlessly. In particular, I found the figuring-things-out gradient (admittedly decoupled from the actually-getting-a-good-answer gradient) blessedly smooth, starting with picking up on the zero-randomness premise[2] and ending with the fun twist that the optimal solution doesn't involve anything being taxed at the lowest rate[3]. I personally got a lot out of this one: for an evening's exacting but enjoyable efforts, I learned about an entire new form of model-building, about the utility and limits of modern AI, and about Banker's Rounding. I vote four-out-of-five for both Quality and Complexity . . . though I recognize that such puzzle-y low-variance games are liable to have higher variance in how they're 

I had terrible luck with symbolic regression, for what its worth. 

1Yonge 
Thank you for posting this. Getting a very good or pefect answer felt a lot easier than most, however getting from a very good answer to a perfect answer seemed more difficult than most. I identified a very good answer very quickly just by looking for combinations that were present in the dataset. It was then rather frustrating to make a lot of progress in untangling the rules and still being unable to find a better solution than the first one I found. Overalll I would rate it as difficulty = 2/5 playability 2/5 where 3 is an average D and D puzzle.   
2simon 
And just now I thought, wait, wouldn't this sometimes round to 10, but no, an AI explained to apparently-stupid me again that since it's a 0.25 tax rate on integer goods, fractional gold pieces before rounding (where not a multiple of 0.1) can only be 0.25, which rounds down to 2 silver, or 0.75, which rounds up to 8 silver. Which makes it all the more surprising that I didn't notice this pattern.
就在刚才,我想,等等,这难道不会有时会四舍五入到 10 吗?但不,一个 AI 向显然愚蠢的我再次解释,由于是整数商品的 0.25 税率,舍入前的分数金币(不是 0.1 的倍数)只能是 0.25,四舍五入到 2 银币,或者 0.75,四舍五入到 8 银币。这使得我之前没有注意到这个模式变得更加令人惊讶。

1. Introduction 

Should we expect the future to be good? This is an important question for many reasons. One such reason is that the answer to this question has implications for what our intermediate goals should be. If we should expect the future to be good, then it would be relatively more important for us to focus on ensuring that we survive long into the future, e.g. by working on mitigating extinction risks. If we should not expect the future to be good, then it would be relatively more important for us to focus on mitigating risks of astronomical suffering. 

In this paper, I critique Paul Christiano's (2013) argument that the future will be good. In Section 2, I reconstruct Christiano's argument in premise form and articulate some simplifying... 

EDIT: Read a summary of this post on Twitter
编辑:在推特上阅读这篇文章的摘要

Working in the field of genetics is a bizarre experience. No one seems to be interested in the most interesting applications of their research.

We’ve spent the better part of the last two decades unravelling exactly how the human genome works and which specific letter changes in our DNA affect things like diabetes risk or college graduation rates. Our knowledge has advanced to the point where, if we had a safe and reliable means of modifying genes in embryos, we could literally create superbabies. Children that would live multiple decades longer than their non-engineered peers, have the raw intellectual horsepower to do Nobel prize worthy scientific research, and very rarely suffer from depression or other mental health disorders.

The scientific establishment,...

Standard deviations are used to characterize the spread around the mean of a normal distribution -- it is not intended to characterize the tails. This is why discussion around it tends to focus on the 1-2 SDs, where the bulk of the data is, and rarely 3-4 SDs -- it is rare to have the data (of sufficient size or low noise) to support meaningful interpretation of even 4 SDs with real-world data. 

So in practice, using precise figures like 5, 7, or 20 SDs is misleading, because the tails aren't usually sufficiently characterized (and it certainly isn't w... (read more)

Did you know that there are LessWrong meetups? To get email notification of meetups near you, enter your location:
    Notify me of events within
    miles
    Show this location on my public profile

    Every now and then, some AI luminaries

    • (1) propose that the future of powerful AI will be reinforcement learning agents—an algorithm class that in many ways has more in common with MuZero (2019) than with LLMs; and
    • (2) propose that the technical problem of making these powerful future AIs follow human commands and/or care about human welfare—as opposed to, y’know, the Terminator thing—is a straightforward problem that they already know how to solve, at least in broad outline.

    I agree with (1) and strenuously disagree with (2).

    The last time I saw something like this, I responded by writing: LeCun’s “A Path Towards Autonomous Machine Intelligence” has an unsolved technical alignment problem.

    Well, now we have a second entry in the series, with the new preprint book chapter “Welcome to the Era of Experience” by...

    I think this is revealing some differences of terminology and intuitions between us. To start with, in the §2.1 definitions, both “goal misgeneralization” and “specification gaming” (a.k.a. “reward hacking”) can be associated with “competent pursuit of goals we don’t want”, w/hereas you seem to be treating “goal misgeneralization” as a competent thing and “reward hacking” as harmless but useless. And “reward hacking” is broader than wireheading.

    For example, if the AI forces the user into eternal cardio training on pain of death, and accordingly the reward

    ... (read more)
    3Steven Byrnes
    I’ve gone back and forth about whether I should be thinking more about (A) “egregious scheming followed by violent takeover” versus (B) more subtle things e.g. related to “different underlying priors for doing philosophical value reflection”. This post emphasizes (A), because it’s in response to the Silver & Sutton proposal that doesn’t even clear that low bar of (A). So forget about (B). There’s a school of thought that says that, if we can get past (A), then we can muddle our way through (B) as well, because if we avoid (A) then we get something like corrigibility and common-sense helpfulness, including checking in before doing irreversible things , and helping with alignment research and oversight. I think this is a rather popular school of thought these days, and is one of the major reasons why the median P(doom) among alignment researchers is probably “only” 20% or whatever, as opposed to much higher. I’m not sure whether I buy that school of thought or not. I’ve been mulling it over and am hoping to discuss it in a forthcoming post. (But it’s moot if we can’t even solve (A).) Regardless, I’m allowed to talk about how (A) is a problem, whether or not (B) is also a problem.  :) I think it would! I think social instincts are in the “non-behaviorist” category, wherein there’s a ground-truth primary reward that depends on what you’re thinking about. And believing that a computer program is suffering is a potential trigger. …I might respond to the rest of your comment in our other thread (when I get a chance).