这是用户在 2025-3-5 15:11 为 https://arxiv.org/html/2501.19393?_immersive_translate_auto_translate=1 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: stix

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: arXiv.org perpetual non-exclusive license
许可证:arXiv.org 永久非独占许可
arXiv:2501.19393v3 [cs.CL] 01 Mar 2025
arXiv:2501.19393v3 [cs.CL] 2025 年 3 月 1 日

s1: Simple test-time scaling
s1: 简单测试时间缩放

Niklas Muennighoff    Zitong Yang    Weijia Shi    Xiang Lisa Li    Li Fei-Fei    Hannaneh Hajishirzi    Luke Zettlemoyer    Percy Liang    Emmanuel Candès    Tatsunori Hashimoto
Abstract  摘要

Test-time scaling is a promising new approach to language modeling that uses extra test-time compute to improve performance. Recently, OpenAI’s o1 model showed this capability but did not publicly share its methodology, leading to many replication efforts. We seek the simplest approach to achieve test-time scaling and strong reasoning performance. First, we curate a small dataset s1K of 1,000 questions paired with reasoning traces relying on three criteria we validate through ablations: difficulty, diversity, and quality. Second, we develop budget forcing to control test-time compute by forcefully terminating the model’s thinking process or lengthening it by appending “Wait” multiple times to the model’s generation when it tries to end. This can lead the model to double-check its answer, often fixing incorrect reasoning steps. After supervised finetuning the Qwen2.5-32B-Instruct language model on s1K and equipping it with budget forcing, our model s1-32B exceeds o1-preview on competition math questions by up to 27% (MATH and AIME24). Further, scaling s1-32B with budget forcing allows extrapolating beyond its performance without test-time intervention: from 50% to 57% on AIME24. Our model, data, and code are open-source at https://github.com/simplescaling/s1.
测试时间缩放是一种有前景的新语言建模方法,它利用额外的测试时间计算来提高性能。最近,OpenAI 的 o1 模型展示了这种能力,但没有公开其方法,导致许多复制工作。我们寻求实现测试时间缩放和强大推理性能的最简单方法。首先,我们根据三个我们通过消融验证的标准(难度、多样性和质量)创建了一个包含 1,000 个问题及其推理轨迹的小数据集 s1K。其次,我们开发了预算强制方法,通过强制终止模型的思考过程或通过多次添加“等待”来延长模型生成时的长度,以控制测试时间计算。这可以使模型重新检查其答案,通常修复错误的推理步骤。在 s1K 上对 Qwen2.5-32B-Instruct 语言模型进行监督微调并配备预算强制后,我们的 s1-32B 模型在竞赛数学问题上的表现超过了 o1-preview 高达 27%(MATH 和 AIME24)。此外,使用预算强制扩展 s1-32B 的性能,可以在没有测试时间干预的情况下进行外推:从 AIME24 的 50%提高到 57%。 我们的模型、数据和代码在 https://github.com/simplescaling/s1 上开源。

Machine Learning, ICML, Large language models, Test-time scaling, Test-time compute
机器学习,ICML,大型语言模型,测试时缩放,测试时计算

1 Introduction  1 引言

Refer to caption
Figure 1: Test-time scaling with s1-32B. We benchmark s1-32B on reasoning-intensive tasks and vary test-time compute.
图 1:使用 s1-32B 进行测试时缩放。我们在推理密集型任务上对 s1-32B 进行基准测试,并改变测试时的计算量。

Performance improvements of language models (LMs) over the past years have largely relied on scaling up train-time compute using large-scale self-supervised pretraining (Kaplan et al., 2020; Hoffmann et al., 2022). The creation of these powerful models has set the stage for a new scaling paradigm built on top of them: test-time scaling. The aim of this approach is to increase the compute at test time to get better results. There has been much work exploring this idea (Snell et al., 2024; Welleck et al., 2024), and the viability of this paradigm was recently validated by OpenAI o1 (OpenAI, 2024). o1 has demonstrated strong reasoning performance with consistent gains from scaling test-time compute. OpenAI describes their approach as using large-scale reinforcement learning (RL) implying the use of sizable amounts of data (OpenAI, 2024). This has led to various attempts to replicate their models relying on techniques like Monte Carlo Tree Search (Gao et al., 2024b; Zhang et al., 2024a), multi-agent approaches (Qin et al., 2024), and others (Wang et al., 2024a; Huang et al., 2024b, 2025). Among these approaches, DeepSeek R1 (DeepSeek-AI et al., 2025) has successfully replicated o1-level performance, also employing reinforcement learning via millions of samples and multiple training stages. However, despite the large number of o1 replication attempts, none have openly replicated a clear test-time scaling behavior. Thus, we ask: what is the simplest approach to achieve both test-time scaling and strong reasoning performance?
近年来语言模型(LMs)的性能提升主要依赖于通过大规模自监督预训练(Kaplan 等人,2020 年;Hoffmann 等人,2022 年)扩大训练时间计算规模。这些强大模型的创建为基于它们的新的扩展范式奠定了基础:测试时间扩展。这种方法的目的是在测试时增加计算量以获得更好的结果。许多研究探讨了这一想法(Snell 等人,2024 年;Welleck 等人,2024 年),并且这种范式的可行性最近得到了 OpenAI o1(OpenAI,2024 年)的验证。o1 展示了强大的推理性能,并通过扩展测试时间计算量实现了持续的增长。OpenAI 将他们的方法描述为使用大规模强化学习(RL),暗示着使用大量数据(OpenAI,2024 年)。这导致了各种尝试复制他们的模型,依赖于蒙特卡洛树搜索(Gao 等人,2024b;Zhang 等人,2024a)、多智能体方法(Qin 等人,2024 年)以及其他方法(Wang 等人,2024a;Huang 等人,2024b,2025 年)。 在这些方法中,DeepSeek R1(DeepSeek-AI 等,2025)成功复制了 o1 级性能,同时通过数百万个样本和多个训练阶段采用强化学习。然而,尽管进行了大量 o1 复制尝试,但没有公开复制出明显的测试时间缩放行为。因此,我们提出问题:实现测试时间缩放和强大推理性能的最简单方法是什么?

We show that training on only 1,000 samples with next-token prediction and controlling thinking duration via a simple test-time technique we refer to as budget forcing leads to a strong reasoning model that scales in performance with more test-time compute. Specifically, we construct s1K, which consists of 1,000 carefully curated questions paired with reasoning traces and answers distilled from Gemini Thinking Experimental (Google, 2024). We perform supervised fine-tuning (SFT) of an off-the-shelf pretrained model on our small dataset requiring just 26 minutes of training on 16 H100 GPUs. After training, we control the amount of test-time compute our model spends using budget forcing: (I) If the model generates more thinking tokens than a desired limit, we forcefully end the thinking process by appending an end-of-thinking token delimiter. Ending the thinking this way makes the model transition to generating its answer. (II) If we want the model to spend more test-time compute on a problem, we suppress the generation of the end-of-thinking token delimiter and instead append “Wait” to the model’s current reasoning trace to encourage more exploration. Equipped with this simple recipe – SFT on 1,000 samples and test-time budget forcing – our model s1-32B exhibits test-time scaling (Figure 1). Further, s1-32B is the most sample-efficient reasoning model and outperforms closed-source models like OpenAI’s o1-preview (Figure 2).
我们表明,仅使用 1,000 个样本进行下一标记预测,并通过我们称之为预算强制的简单测试时间技术控制思考时间,可以导致一个强大的推理模型,其性能随着更多测试时间计算能力的增加而扩展。具体来说,我们构建了 s1K,它由 1,000 个精心挑选的问题组成,这些问题与来自 Gemini Thinking Experimental(谷歌,2024)的推理轨迹和答案相匹配。我们在我们的小型数据集上对现成的预训练模型进行了监督微调(SFT),只需在 16 个 H100 GPU 上训练 26 分钟即可。训练后,我们使用预算强制来控制模型在测试时间上花费的计算量:(I)如果模型生成的思考标记超过了期望的限制,我们将通过添加思考结束标记分隔符强制结束思考过程。以这种方式结束思考使模型过渡到生成答案。(II)如果我们希望模型在某个问题上花费更多的测试时间计算量,我们将抑制生成思考结束标记分隔符,并代替地将其当前推理轨迹添加“等待”来鼓励更多的探索。 配备了这个简单的配方——在 1,000 个样本上执行 SFT 和测试时间预算强制——我们的模型 s1-32B 表现出测试时间缩放(图 1)。此外,s1-32B 是最高效的样本推理模型,并优于 OpenAI 的 o1-preview 等闭源模型(图 2)。

We conduct extensive ablation experiments targeting (a) our selection of 1,000 (1K) reasoning samples and (b) our test-time scaling. For (a), we find that jointly incorporating difficulty, diversity, and quality measures into our selection algorithm is important. Random selection, selecting samples with the longest reasoning traces, or only selecting maximally diverse samples all lead to significantly worse performance (around --30% on AIME24 on average). Training on our full data pool of 59K examples, a superset of s1K, does not offer substantial gains over our 1K selection. This highlights the importance of careful data selection and echoes prior findings for instruction tuning (Zhou et al., 2023). For (b), we define desiderata for test-time scaling methods to compare different approaches. Budget forcing leads to the best scaling as it has perfect controllability with a clear positive slope leading to strong performance.
我们针对(a)我们选择的 1,000(1K)推理样本和(b)我们的测试时间缩放进行了广泛的消融实验。对于(a),我们发现将难度、多样性和质量度量联合纳入我们的选择算法非常重要。随机选择、选择推理轨迹最长的样本或仅选择最大多样性的样本都会导致性能显著下降(平均在 AIME24 上约为 -- 30%)。在包含 59K 个示例的完整数据池上进行训练,这个数据池是 s1K 的超集,并不比我们的 1K 选择带来实质性的提升。这突出了仔细数据选择的重要性,并呼应了先前关于指令微调(Zhou et al., 2023)的研究发现。对于(b),我们为测试时间缩放方法定义了期望目标,以比较不同的方法。预算强制导致最佳的缩放,因为它具有完美的可控性,清晰的正面斜率导致强大的性能。

In summary, our contributions are: We develop simple methods for creating a sample-efficient reasoning dataset (§2) and test-time scaling (§3); Based on these we build s1-32B which is competitive with o1-preview (§4); We ablate subtleties of data (§5.1) and test-time scaling (§5.2). We end with a discussion to motivate future work on simple reasoning (§6). Our code, model, and data are open-source at https://github.com/simplescaling/s1.
总结来说,我们的贡献包括:我们开发了创建高效样本推理数据集的简单方法(§2)和测试时缩放(§3);基于这些,我们构建了与 o1-preview(§4)具有竞争力的 s1-32B;我们消除了数据(§5.1)和测试时缩放(§5.2)的细微差别。我们以对简单推理未来工作的讨论(§6)结束。我们的代码、模型和数据在 https://github.com/simplescaling/s1 上开源。

Refer to caption
Refer to caption
Figure 2: s1K and s1-32B. (left) s1K is a dataset of 1,000 high-quality, diverse, and difficult questions with reasoning traces. (right) s1-32B, a 32B parameter model finetuned on s1K is on the sample-efficiency frontier. See LABEL:tab:perf for details on other models.
图 2:s1K 和 s1-32B。(左)s1K 是一个包含 1,000 个高质量、多样和难度大的问题及其推理轨迹的数据集。(右)s1-32B,一个在 s1K 上微调的 32B 参数模型位于样本效率前沿。参见标签:tab:perf,了解其他模型的详细信息。

2 Reasoning data curation to create s1K
2 理由数据整理以创建 s1K

In this section, we describe our process for creating a large dataset first in §2.1 and then filtering it down to s1K in §2.2.
在这一节中,我们首先在 2.1 节描述了创建大型数据集的过程,然后在 2.2 节将其筛选至 s1K。

2.1 Initial collection of 59K samples
2.1 初始收集 59K 个样本

We collect an initial 59,029 questions from 16 sources following three guiding principles. Quality: Datasets should be high-quality; we always inspect samples and ignore datasets with, e.g., poor formatting; Difficulty: Datasets should be challenging and require significant reasoning effort; Diversity: Datasets should stem from various fields to cover different reasoning tasks. We collect datasets of two categories:
我们从 16 个来源收集了最初的 59,029 个问题,遵循以下三个指导原则。质量:数据集应高质量;我们始终检查样本,并忽略例如格式不佳的数据集;难度:数据集应具有挑战性,并需要大量的推理努力;多样性:数据集应源自不同领域,以涵盖不同的推理任务。我们收集了两种类别的数据集:

Curation of existing datasets
现有数据集的整理

Our largest source is NuminaMATH (LI et al., 2024) with 30,660 mathematical problems from online websites. We also include historical AIME problems (1983-2021). To enhance diversity, we add OlympicArena (Huang et al., 2024a) with 4,250 questions spanning Astronomy, Biology, Chemistry, Computer Science, Geography, Mathematics, and Physics from various Olympiads. OmniMath (Gao et al., 2024a) adds 4,238 competition-level mathematics problems. We also include 2,385 problems from AGIEval (Zhong et al., 2023), which features questions from standardized tests like SAT and LSAT, covering English, Law, and Logic. We refer to Table 7 in §C for our other sources.
我们的最大来源是 NuminaMATH(LI 等,2024),包含来自在线网站的 30,660 个数学问题。我们还包括 1983-2021 年的历史 AIME 问题。为了增加多样性,我们添加了 OlympicArena(Huang 等,2024a),包含来自各种奥林匹克竞赛的天文学、生物学、化学、计算机科学、地理学、数学和物理学等领域的 4,250 个问题。OmniMath(Gao 等,2024a)增加了 4,238 个竞赛级别的数学问题。我们还包括来自 AGIEval(Zhong 等,2023)的 2,385 个问题,这些问题是来自 SAT 和 LSAT 等标准化考试的问题,涵盖英语、法律和逻辑。我们参考§C 中的表 7 以了解其他来源。

New datasets in quantitative reasoning
新的定量推理数据集

To complement these existing datasets, we create two original datasets. s1-prob consists of 182 questions from the probability section of Stanford University’s Statistics Department’s PhD Qualifying Exams (https://statistics.stanford.edu), accompanied by handwritten solutions that cover difficult proofs. The probability qualifying exam is held yearly and requires professional-level mathematical problem-solving. s1-teasers comprises 23 challenging brain-teasers commonly used in interview questions for quantitative trading positions. Each sample consists of a problem and solution taken from PuzzledQuant (https://www.puzzledquant.com/). We only take examples with the highest difficulty level (”Hard”).
为了补充这些现有数据集,我们创建了两个原始数据集。s1-prob 包含来自斯坦福大学统计学系博士资格考试(https://statistics.stanford.edu)概率部分的 182 个问题,附带涵盖困难证明的手写解答。概率资格考试每年举行一次,需要专业水平的数学问题解决能力。s1-teasers 包含 23 个常见的面试问题中的挑战性脑筋急转弯,这些脑筋急转弯常用于量化交易职位的面试问题。每个样本都包含来自 PuzzledQuant(https://www.puzzledquant.com/)的问题和解答。我们只选取难度最高的例子(“困难”)。

For each question, we generate a reasoning trace and solution using the Google Gemini Flash Thinking API (Google, 2024) extracting its reasoning trace and response. This yields 59K triplets of a question, generated reasoning trace, and generated solution. Examples from our dataset are in §D.2. We decontaminate all samples against our evaluation questions (MATH500, GPQA Diamond, AIME24; §C.5) using 8-grams and deduplicate the data.
对于每个问题,我们使用 Google Gemini Flash Thinking API(Google,2024)生成推理轨迹和解决方案,提取其推理轨迹和响应。这产生了 59K 个问题、生成推理轨迹和生成解决方案的三元组。我们的数据集示例见§D.2。我们使用 8-gram 对所有样本进行去污染,与我们的评估问题(MATH500、GPQA Diamond、AIME24;§C.5)进行对比,并去重数据。

2.2 Final selection of 1K samples
2.2 1K 样本的最终选择

We could directly train on our pool of 59K questions, however, our goal is to find the simplest approach with minimal resources. Thus, we go through three stages of filtering to arrive at a minimal set of 1,000 samples relying on our three guiding data principles: Quality, Difficulty, and Diversity.
我们可以在我们的 59K 个问题库中直接进行训练,然而,我们的目标是找到资源最少的简单方法。因此,我们经过三个阶段的筛选,最终得到一个包含 1,000 个样本的最小集合,这依赖于我们三个指导数据原则:质量、难度和多样性。

Quality  质量

We first remove any questions where we ran into any API errors reducing our dataset to 54,116 samples. Next, we filter out low-quality examples by checking if they contain any string patterns with formatting issues, such as ASCII art diagrams, non-existent image references, or inconsistent question numbering reducing our dataset to 51,581 examples. From this pool, we identify 384 samples for our final 1,000 samples from datasets that we perceive as high-quality and not in need of further filtering (see §C.4 for details).
我们首先删除了遇到任何 API 错误的问题,将数据集减少到 54,116 个样本。接下来,我们通过检查是否存在任何格式问题(如 ASCII 艺术图表、不存在的图像引用或不一致的问题编号)的字符串模式来过滤掉低质量示例,将数据集减少到 51,581 个示例。从这个池子中,我们确定了 384 个样本,作为最终 1,000 个样本,这些样本来自我们认为高质量且不需要进一步过滤的数据集(详情见§C.4)。

Difficulty  难度

For difficulty, we use two indicators: model performance and reasoning trace length. We evaluate two models on each question: Qwen2.5-7B-Instruct and Qwen2.5-32B-Instruct (Qwen et al., 2024), with correctness assessed by Claude 3.5 Sonnet comparing each attempt against the reference solution (see §C.3 for the grading protocol). We measure the token length of each reasoning trace to indicate problem difficulty using the Qwen2.5 tokenizer. This relies on the assumption that more difficult problems require more thinking tokens. Based on the grading, we remove questions that either Qwen2.5-7B-Instruct or Qwen2.5-32B-Instruct can solve correctly and thus may be too easy. By using two models we reduce the likelihood of an easy sample slipping through our filtering due to a rare mistake on an easy question of one of the models. This brings our total samples down to 24,496, setting the stage for the next round of subsampling based on diversity. While filtering with these two models may be optimized for our setup as we will also use Qwen2.5-32B-Instruct as our model to finetune, the idea of model-based filtering generalizes to other setups.
对于难度,我们使用两个指标:模型性能和推理轨迹长度。我们对每个问题评估两个模型:Qwen2.5-7B-Instruct 和 Qwen2.5-32B-Instruct(Qwen 等人,2024 年),通过 Claude 3.5 Sonnet 比较每个尝试与参考解决方案的正确性来评估(见§C.3 以了解评分协议)。我们使用 Qwen2.5 分词器测量每个推理轨迹的标记长度,以指示问题难度。这基于一个假设,即更困难的问题需要更多的思考标记。基于评分,我们移除 Qwen2.5-7B-Instruct 或 Qwen2.5-32B-Instruct 可以正确解决的题目,因此可能过于简单。通过使用两个模型,我们减少了由于模型在简单问题上的罕见错误而使简单样本通过过滤的可能性。这使得我们的总样本量降至 24,496,为基于多样性的下一轮子采样奠定了基础。虽然使用这两个模型进行过滤可能针对我们的设置进行了优化,因为我们还将使用 Qwen2.5-32B-Instruct 作为我们的模型进行微调,但基于模型的过滤思想可以推广到其他设置。

Diversity  多样性

To quantify diversity, we classify questions into domains using Claude 3.5 Sonnet based on the Mathematics Subject Classification (MSC) system (e.g., geometry, combinatorics, etc.) from the American Mathematical Society.111https://mathscinet.ams.org/mathscinet/msc/msc2020.html The taxonomy focuses on topics in mathematics but also includes other sciences such as biology, physics, and economics. To select our final examples from the pool of 24,496 questions, we first choose one domain uniformly at random. Then, we sample one problem from this domain according to a distribution that favors longer reasoning traces (see §C.4 for details) as motivated in Difficulty. We repeat this process until we have 1,000 total samples spanning 50 domains.
为了量化多样性,我们使用 Claude 3.5 Sonnet 根据美国数学学会的数学主题分类(MSC)系统(例如,几何学、组合数学等)将问题分类到各个领域。该分类法专注于数学领域,但也包括其他科学,如生物学、物理学和经济学。为了从 24,496 个问题的池中选出我们的最终示例,我们首先随机选择一个领域。然后,根据在难度中提出的、有利于更长的推理轨迹的分布,从这个领域抽取一个问题(详情见§C.4)。我们重复此过程,直到我们拥有跨越 50 个领域的 1,000 个总样本。

In §5.1, we will show that using our three criteria in combination is important, as only relying on quality, diversity, or difficulty in isolation leads to worse datasets. Some distilled generations are incorrect, which we allow in our data as we focus on capturing the reasoning process rather than entirely correct solutions. Our grader (§C.3) deems 53.6% correct in s1K and 63.0% in our follow-up s1K-1.1 (see §A).
在§5.1 中,我们将展示将我们的三个标准结合使用的重要性,因为仅依赖于质量、多样性或难度中的任何一个都会导致数据集质量下降。一些提炼出的生成内容是错误的,我们允许这些错误存在于我们的数据中,因为我们更关注捕捉推理过程而不是完全正确的解决方案。我们的评分者(§C.3)认为 s1K 中有 53.6%是正确的,在后续的 s1K-1.1 中有 63.0%(见§A)。

3 Test-time scaling  3 测试时缩放

Refer to caption
Figure 3: Budget forcing with s1-32B. The model tries to stop after “…is 2.”, but we suppress the end-of-thinking token delimiter instead appending “Wait” leading s1-32B to self-correct its answer.
图 3:使用 s1-32B 进行预算强制。模型试图在“…是 2。”后停止,但我们抑制了思考结束标记分隔符,而是附加“等待”,使 s1-32B 自行纠正答案。

3.1 Method  3.1 方法

We classify test-time scaling methods into 1) Sequential, where later computations depend on earlier ones (e.g., a long reasoning trace), and 2) Parallel, where computations run independently (e.g., majority voting) (Snell et al., 2024; Brown et al., 2024). We focus on sequential scaling as intuitively we believe it should scale better, since later computations can build on intermediate results, allowing for deeper reasoning and iterative refinement. We propose new sequential scaling methods and ways to benchmark them.
我们将测试时间缩放方法分为 1)顺序型,其中后续计算依赖于早期计算(例如,长推理轨迹),和 2)并行型,其中计算独立运行(例如,多数投票)(Snell 等,2024;Brown 等,2024)。我们专注于顺序型缩放,因为我们直观地认为它应该有更好的缩放效果,因为后续计算可以基于中间结果,从而实现更深入的推理和迭代优化。我们提出了新的顺序型缩放方法及其基准测试方法。

Budget forcing  预算强制

We propose a simple decoding-time intervention by forcing a maximum and/or minimum number of thinking tokens. Specifically, we enforce a maximum token count by simply appending the end-of-thinking token delimiter and optionally “Final Answer:” to early exit the thinking stage and make the model provide its current best answer. To enforce a minimum, we suppress the generation of the end-of-thinking token delimiter and optionally append the string “Wait” to the model’s current reasoning trace to encourage the model to reflect on its current generation. Figure 3 contains an example of how this simple approach can lead the model to arrive at a better answer.
我们提出了一种简单的解码时间干预方法,通过强制设置最大和/或最小数量的思考标记。具体来说,我们通过简单地添加思考结束标记并可选地添加“最终答案:”来提前退出思考阶段,使模型提供其当前最佳答案,从而强制执行最大标记计数。为了强制执行最小数量,我们抑制生成思考结束标记,并可选地添加字符串“等待”到模型的当前推理轨迹中,以鼓励模型反思其当前生成的内容。图 3 展示了这种简单方法如何引导模型得出更好的答案。

Baselines  基线

We benchmark budget forcing with: (I) Conditional length-control methods, which rely on telling the model in the prompt how long it should generate for. We group them by granularity into (a) Token-conditional control: We specify an upper bound of thinking tokens in the prompt; (b) Step-conditional control: We specify an upper bound of thinking steps, where each step is around 100 tokens; (c) Class-conditional control: We write two generic prompts that tell the model to either think for a short or long amount of time (see §E.1 for details). (II) Rejection sampling, which samples until a generation fits a predetermined compute budget. This oracle captures the posterior over responses conditioned on its length.
我们使用以下方法对预算强制进行基准测试:(I)条件长度控制方法,这些方法依赖于在提示中告诉模型它应该生成多长时间。我们将它们按粒度分为:(a)标记条件控制:我们在提示中指定思考标记的上限;(b)步骤条件控制:我们指定思考步骤的上限,每个步骤大约有 100 个标记;(c)类别条件控制:我们编写两个通用提示,告诉模型要么思考一段时间,要么思考很长时间(详情见§E.1)。(II)拒绝采样,它采样直到生成的内容符合预定的计算预算。这个预言家捕捉了基于其长度的响应的后验分布。

3.2 Metrics  3.2 度量指标

We establish a set of desiderata as evaluation metrics to measure test-time scaling across methods. Importantly, we do not only care about the accuracy a method can achieve but also its controllability and test-time scaling slope. For each method we consider, we run a set of evaluations a𝒜a\in\mathcal{A}italic_a ∈ caligraphic_A varying test-time compute on a fixed benchmark, e.g. AIME24. This produces a piece-wise linear function ffitalic_f with compute as the x-axis measured in thinking tokens and accuracy as the y-axis (see Figure 1, where the rightmost dot for AIME24 corresponds to f(7320)=57%f(7320)=57\%italic_f ( 7320 ) = 57 %). We measure three metrics:
我们建立了一套期望指标作为评估标准,以衡量方法在测试时的缩放。重要的是,我们不仅关注方法可以达到的准确性,还关注其可控性和测试时缩放斜率。对于每种我们考虑的方法,我们在一个固定的基准上运行一系列评估,例如 AIME24。这产生了一个分段线性函数,其中计算作为 x 轴(以思维令牌衡量)和准确性作为 y 轴(见图 1,其中 AIME24 的最右端点对应于 f(7320)=57%7320percent57f(7320)=57\%italic_f ( 7320 ) = 57 % )。我们测量三个指标:

Control=1|𝒜|a𝒜𝕀(aminaamax)\displaystyle\text{Control}=\frac{1}{|\mathcal{A}|}\sum_{a\in\mathcal{A}}% \mathbb{I}(a_{\text{min}}\leq a\leq a_{\text{max}})Control = divide start_ARG 1 end_ARG start_ARG | caligraphic_A | end_ARG ∑ start_POSTSUBSCRIPT italic_a ∈ caligraphic_A end_POSTSUBSCRIPT blackboard_I ( italic_a start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ≤ italic_a ≤ italic_a start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ) (1)

where amin,amaxa_{\text{min}},a_{\text{max}}italic_a start_POSTSUBSCRIPT min end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT max end_POSTSUBSCRIPT refer to a pre-specified minimum and maximum amount of test-time compute; in our case thinking tokens. We usually only constrain amaxa_{\text{max}}italic_a start_POSTSUBSCRIPT max end_POSTSUBSCRIPT. As tokens generated correspond to the amount of test-time compute spent, this metric measures the extent to which a method allows controllability over the use of that test-time compute. We report it as a percentage with 100% being perfect control.
amin,amaxsubscriptsubscripta_{\text{min}},a_{\text{max}}italic_a start_POSTSUBSCRIPT min end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT max end_POSTSUBSCRIPT 指的是预先指定的测试时间计算的最小和最大量;在我们的情况下是思考令牌。我们通常只约束 amaxsubscripta_{\text{max}}italic_a start_POSTSUBSCRIPT max end_POSTSUBSCRIPT 。由于生成的令牌对应于花费的测试时间计算量,这个指标衡量了一种方法在控制该测试时间计算方面的程度。我们将其报告为百分比,100%表示完美控制。

Scaling=1(|𝒜|2)a,b𝒜b>af(b)f(a)ba\displaystyle\text{Scaling}=\frac{1}{\binom{|\mathcal{A}|}{2}}\sum_{\begin{% subarray}{c}a,b\in\mathcal{A}\\ b>a\end{subarray}}\frac{f(b)-f(a)}{b-a}Scaling = divide start_ARG 1 end_ARG start_ARG ( FRACOP start_ARG | caligraphic_A | end_ARG start_ARG 2 end_ARG ) end_ARG ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_a , italic_b ∈ caligraphic_A end_CELL end_ROW start_ROW start_CELL italic_b > italic_a end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG italic_f ( italic_b ) - italic_f ( italic_a ) end_ARG start_ARG italic_b - italic_a end_ARG (2)

Scaling is the average slope of the piece-wise linear function. It must be positive for useful methods and larger is better.
缩放是分段线性函数的平均斜率。对于有效方法,它必须是正的,越大越好。

Performance  性能 =maxa𝒜f(a)\displaystyle=\max_{a\in\mathcal{A}}f(a)= roman_max start_POSTSUBSCRIPT italic_a ∈ caligraphic_A end_POSTSUBSCRIPT italic_f ( italic_a ) (3)

Performance is simply the maximum performance the method achieves on the benchmark. A method with monotonically increasing scaling achieves 100% performance on any benchmark in the limit. However, the methods we investigate eventually flatten out or further scaling fails due to control or context window limitations.
性能仅仅是该方法在基准测试上达到的最大性能。具有单调递增缩放的方法在极限情况下在任何基准测试上都能达到 100%的性能。然而,我们研究的方法最终会趋于平稳或进一步缩放因控制或上下文窗口限制而失败。

Refer to caption
(a) Sequential scaling via budget forcing
(一) 通过预算强制实现的顺序缩放
Refer to caption
(b) Parallel scaling via majority voting
( b ) 通过多数投票实现并行扩展
Figure 4: Sequential and parallel test-time scaling. (a): Budget forcing shows clear scaling trends and extrapolates to some extent. For the three rightmost dots, we prevent the model from stopping its thinking 2/4/6 times, each time appending “Wait” to its current reasoning trace. (b): For Qwen2.5-32B-Instruct we perform 64 evaluations for each sample with a temperature of 1 and visualize the performance when majority voting across 2, 4, 8, 16, 32, and 64 of these.
图 4:测试时顺序和并行缩放。(a):预算强制显示了明显的缩放趋势并在一定程度上进行了外推。对于最右侧的三个点,我们防止模型停止思考 2/4/6 次,每次将其当前推理轨迹附加“等待”。(b):对于 Qwen2.5-32B-Instruct,我们对每个样本进行 64 次评估,温度为 1,并在这些样本的 2、4、8、16、32 和 64 次多数投票中可视化性能。

4 Results  4 结果

4.1 Setup  4.1 设置

Training  训练

We perform supervised finetuning on Qwen2.5-32B-Instruct using s1K to obtain our model s1-32B using basic hyperparameters outlined in §D. Finetuning took 26 minutes on 16 NVIDIA H100 GPUs with PyTorch FSDP.
我们在 Qwen2.5-32B-Instruct 上使用 s1K 进行监督微调,以获得我们的 s1-32B 模型,该模型采用§D 中概述的基本超参数。微调在 16 个 NVIDIA H100 GPU 上使用 PyTorch FSDP 耗时 26 分钟。

Evaluation  评估

We select three representative reasoning benchmarks widely used in the field: AIME24 (of America, 2024) has 30 problems that were used in the 2024 American Invitational Mathematics Examination (AIME) held from January 31 – February 1, 2024. AIME tests mathematical problem-solving with arithmetic, algebra, counting, geometry, number theory, probability, and other secondary school math topics. High-scoring high school students in the test are invited to participate in the United States of America Mathematics Olympiad (USAMO). All AIME answers are integers ranging from 000000000 to 999999999, inclusive. Some AIME problems rely on figures that we provide to our model using the vector graphics language Asymptote as it cannot take image inputs. MATH500 (Hendrycks et al., 2021) is a benchmark of competition math problems of varying difficulty. We evaluate on the same 500 samples selected by OpenAI in prior work (Lightman et al., 2023). GPQA Diamond (Rein et al., 2023) consists of 198 PhD-level science questions from Biology, Chemistry and Physics. Experts with PhDs in the corresponding domains only achieved 69.7% on GPQA Diamond (OpenAI, 2024). When we write “GPQA” in the context of evaluation in this work, we always refer to the Diamond subset. We build on the “lm-evaluation-harness” framework (Gao et al., 2021; Biderman et al., 2024). Unless otherwise specified, we evaluate with a temperature of 0 (greedy) and measure accuracy (equivalent to pass@1).
我们选择了三个在领域内广泛使用的代表性推理基准:AIME24(美国,2024)包含 30 个问题,这些问题用于 2024 年 1 月 31 日至 2 月 1 日举行的美国邀请数学竞赛(AIME)。AIME 通过算术、代数、计数、几何、数论、概率和其他中学数学主题来测试数学问题解决能力。在测试中得分较高的高中生被邀请参加美国数学奥林匹克(USAMO)。所有 AIME 答案都是整数,范围从 000000000000999999999999 ,包括 000000000000999999999999 。一些 AIME 问题依赖于我们使用向量图形语言 Asymptote 提供的图形,因为我们的模型不能接受图像输入。MATH500(Hendrycks 等人,2021)是一个不同难度的竞赛数学问题基准。我们在 OpenAI 先前工作中选择的相同 500 个样本上进行评估(Lightman 等人,2023)。GPQA Diamond(Rein 等人,2023)由来自生物学、化学和物理的 198 个博士水平科学问题组成。只有对应领域的博士专家在 GPQA Diamond 上达到了 69.7%(OpenAI,2024)。 当我们在这项工作中提到“GPQA”时,我们总是指代 Diamond 子集。我们基于“lm-evaluation-harness”框架(高翔等,2021;比德曼等,2024)进行构建。除非另有说明,我们以 0(贪婪)的温度进行评估,并测量准确率(相当于 pass@1)。

Other models  其他模型

We benchmark s1-32B against: OpenAI o1 series (OpenAI, 2024), closed-source models that popularized test-time scaling; DeepSeek r1 series (DeepSeek-AI et al., 2025), open-weight reasoning models with up to o1-level performance; Qwen’s QwQ-32B-preview (Team, 2024), a 32B open-weight reasoning model without disclosed methodology; Sky-T1-32B-Preview (Team, 2025) and Bespoke-32B (Labs, 2025), open models with open reasoning data distilled from QwQ-32B-preview and r1; Google Gemini 2.0 Flash Thinking Experimental (Google, 2024), the API that we distill from. As it has no official evaluation scores, we use the Gemini API to benchmark it ourselves. However, the “recitation error” of the Gemini API makes evaluation challenging.222https://github.com/google/generative-ai-docs/issues/257 We circumvent this, by manually inserting all 30 AIME24 questions in its web interface where the error does not appear. However, we leave out MATH500 (500 questions) and GPQA Diamond (198 questions), thus they are N.A. in LABEL:tab:perf. Our model, s1-32B, is fully open including weights, reasoning data, and code.
我们将 s1-32B 与以下基准模型进行比较:OpenAI o1 系列(OpenAI,2024),流行测试时缩放的闭源模型;DeepSeek r1 系列(DeepSeek-AI 等,2025),具有高达 o1 级性能的开源推理模型;Qwen 的 QwQ-32B-preview(团队,2024),一个 32B 开源推理模型,未公开方法;Sky-T1-32B-Preview(团队,2025)和 Bespoke-32B(实验室,2025),从 QwQ-32B-preview 和 r1 中提取开源推理数据的开放模型;Google Gemini 2.0 闪思实验(Google,2024),我们从其提炼的 API。由于它没有官方评估分数,我们使用 Gemini API 自行进行基准测试。然而,Gemini API 的“背诵错误”使得评估变得具有挑战性。通过手动将其所有 30 个 AIME24 问题插入其网页界面(错误不会出现),我们绕过了这一点。然而,我们排除了 MATH500(500 个问题)和 GPQA Diamond(198 个问题),因此它们在标签:tab:perf 中为 N.A.。我们的模型 s1-32B 完全开源,包括权重、推理数据和代码。

Table 1: s1-32B is a strong open reasoning model. We evaluate s1-32B, Qwen, and Gemini (some entries are unknown (N.A.), see §4). Other results are from the respective reports (Qwen et al., 2024; Team, 2024; OpenAI, 2024; DeepSeek-AI et al., 2025; Labs, 2025; Team, 2025). # ex. = number examples used for reasoning finetuning; BF = budget forcing. See §A for our better s1.1 model.
表 1:s1-32B 是一个强大的开放推理模型。我们评估了 s1-32B、Qwen 和 Gemini(部分条目未知(N.A.),见§4)。其他结果来自各自的报告(Qwen 等,2024;团队,2024;OpenAI,2024;DeepSeek-AI 等,2025;实验室,2025;团队,2025)。# ex. = 用于推理微调的示例数量;BF = 预算强制。见§A 了解我们更好的 s1.1 模型。
Model # ex.  # 示例 AIME 2024 MATH  数学 500 GPQA Diamond  钻石
API only  API 仅
o1-preview  o1-预览 N.A. 44.6 85.5 73.3
o1-mini  o1-迷你 N.A. 70.0 90.0 60.0
o1 N.A. 74.4 94.8 77.3
Gemini 2.0 N.A. 60.0 N.A. N.A.
Flash Think.  闪思。
Open Weights  开放重量
Qwen2.5- N.A. 26.7 84.0 49.0
32B-Instruct  32B-指令
QwQ-32B N.A. 50.0 90.6 54.5
r1  r1:r1 \gg800K 79.8 97.3 71.5
r1-distill  r1-蒸馏 800K 72.6 94.3 62.1
Open Weights and Open Data
开放式重量和开放式数据
Sky-T1  天际-T1 17K 43.3 82.4 56.8
Bespoke-32B 17K 63.3 93.0 58.1
s1 w/o BF  s1 无 BF 1K 50.0 92.6 56.6
s1-32B 1K 56.7 93.0 59.6

4.2 Performance  4.2 性能

Test-time scaling  测试时间缩放

Figure 1 shows the performance of s1-32B with budget forcing scales with more test-time compute. In Figure 4 (left), we expand the plot from Figure 1 (middle) showing that while we can improve AIME24 performance using our budget forcing technique (§3) and more test-time compute it does eventually flatten out at six times. Suppressing the end-of-thinking token delimiter too often can lead the model into repetitive loops instead of continued reasoning. In Figure 4 (right), we show that after training Qwen2.5-32B-Instruct on our 1,000 samples to produce s1-32B and equipping it with the simple budget forcing technique, it operates in a different scaling paradigm. Scaling test-time compute on the base model via majority voting cannot catch up with the performance of s1-32B which validates our intuition from §3 that sequential scaling is more effective than parallel. We provide example generations of s1-32B in Figure 5.
图 1 展示了 s1-32B 在预算强制缩放和更多测试时间计算下的性能。在图 4(左)中,我们扩展了图 1(中)的图表,表明虽然我们可以通过我们的预算强制技术(§3)和更多测试时间计算来提高 AIME24 的性能,但它最终会在六倍时趋于平坦。过度抑制思考结束标记分隔符可能导致模型陷入重复循环而不是持续推理。在图 4(右)中,我们展示了在 1,000 个样本上训练 Qwen2.5-32B-Instruct 以产生 s1-32B,并配备简单的预算强制技术后,它运行在不同的缩放范式下。通过多数投票在基础模型上缩放测试时间计算无法赶上 s1-32B 的性能,这验证了我们在§3 中的直觉,即顺序缩放比并行缩放更有效。我们在图 5 中提供了 s1-32B 的示例生成。

Sample-efficiency  样本效率

In Figure 2 (right) and LABEL:tab:perf we compare s1-32B with other models. We find that s1-32B is the most sample-efficient open data reasoning model. It performs significantly better than our base model (Qwen2.5-32B-Instruct) despite just training it on an additional 1,000 samples. The concurrently released r1-32B shows stronger performance than s1-32B while also only using SFT (DeepSeek-AI et al., 2025). However, it is trained on 800 ×\times× more reasoning samples. It is an open question whether one can achieve their performance with just 1,000 samples. Finally, our model nearly matches Gemini 2.0 Thinking on AIME24. As the data for s1-32B is distilled from Gemini 2.0, this shows our distillation procedure was likely effective.
图 2(右)和标签:tab:perf 中,我们比较了 s1-32B 与其他模型。我们发现 s1-32B 是最具样本效率的开源数据推理模型。尽管只是在额外 1,000 个样本上进行了训练,它的表现显著优于我们的基础模型(Qwen2.5-32B-Instruct)。同时发布的 r1-32B 在 s1-32B 的基础上表现出更强的性能,同时仅使用了 SFT(DeepSeek-AI 等,2025)。然而,它是在 800 个 ×\times× 更多的推理样本上训练的。能否仅用 1,000 个样本达到其性能还是一个开放性问题。最后,我们的模型在 AIME24 上几乎与 Gemini 2.0 Thinking 相当。由于 s1-32B 的数据是从 Gemini 2.0 蒸馏而来,这表明我们的蒸馏过程可能是有效的。

[AIME24] Alice and Bob play the following game. A stack of nnitalic_n tokens lies before them. The players take turns with Alice going first. On each turn, the player removes either 111 token or 444 tokens from the stack. Whoever removes the last token wins. Find the number of positive integers nnitalic_n less than or equal to 202420242024 for which there exists a strategy for Bob that guarantees that Bob will win the game regardless of Alice’s play.
Alice 和 Bob 玩以下游戏。面前有一堆 nnitalic_n 代币。玩家轮流玩,Alice 先手。每轮,玩家从堆中移除 1111 个或 4444 个代币。移除最后一个代币的玩家获胜。找出所有小于或等于 2024202420242024 的正整数 nnitalic_n ,对于这些数,Bob 存在一种策略可以保证无论 Alice 如何出牌,Bob 都能获胜。
The problem asks for the number of positive integers n2024n\leq 2024italic_n ≤ 2024 such that Bob has a winning strategy. This is a typical impartial game that can be analyzed using the Sprague-Grundy theorem or by directly finding the winning and losing positions. Let WWitalic_W be the set of winning positions (number of tokens nnitalic_n such that the player whose turn it is can force a win) and LLitalic_L be the set of losing positions. […]
问题要求找出 Bob 有获胜策略的正整数 n20242024n\leq 2024italic_n ≤ 2024 的个数。这是一个典型的公平游戏,可以使用 Sprague-Grundy 定理或直接找出获胜和失败位置来分析。设 WWitalic_W 为获胜位置集合(玩家当前回合可以迫使获胜的代币 nnitalic_n 的数量), LLitalic_L 为失败位置集合。[...]
Let P(n)P(n)italic_P ( italic_n ) be the statement ”the player whose turn it is when there are nnitalic_n tokens has a winning strategy. P(0)P(0)italic_P ( 0 ) is false (the player whose turn it is and faces 0 tokens has already lost). Consider the small values of nnitalic_n:
P(n)P(n)italic_P ( italic_n ) 为陈述:“当有 nnitalic_n 个标记时轮到该玩家的回合,该玩家有获胜策略。” P(0)0P(0)italic_P ( 0 ) 是错误的(轮到该玩家回合且面对 00 个标记的玩家已经输了)。考虑 nnitalic_n 的小值:
n=1n=1italic_n = 1: The player whose turn it is can take 111 token and win. So P(1)P(1)italic_P ( 1 ) is true. 1W1\in W1 ∈ italic_W.
玩家轮到其行动时可以拿取 1111 代币并获胜。因此 P(1)1P(1)italic_P ( 1 ) 为真。 1W11\in W1 ∈ italic_W
n=2n=2italic_n = 2: The player whose turn it is can take 111 token, leaving 111 token for the opponent. Since 1W1\in W1 ∈ italic_W, the opponent wins. So P(2)P(2)italic_P ( 2 ) is false. 2L2\in L2 ∈ italic_L.
当前玩家可以拿走 1111 个标记,留给对手 1111 个标记。由于 1W11\in W1 ∈ italic_W ,对手获胜。因此 P(2)2P(2)italic_P ( 2 ) 是错误的。 2L22\in L2 ∈ italic_L
n=3n=3italic_n = 3: The player whose turn it is can take 111 token, leaving 222 tokens for the opponent. Since 2L2\in L2 ∈ italic_L, the player whose turn it is wins. So P(3)P(3)italic_P ( 3 ) is true. 3W3\in W3 ∈ italic_W.[…]
当前玩家可以拿取 1111 代币,留给对手 2222 代币。由于 2L22\in L2 ∈ italic_L ,当前玩家获胜。因此 P(3)3P(3)italic_P ( 3 ) 为真。 3W33\in W3 ∈ italic_W 。[...]
We are looking for the number of positive integers n2024n\leq 2024italic_n ≤ 2024 for which Bob has a winning strategy. This means that if the game starts with nnitalic_n tokens, and Alice goes first, Bob can win. This happens if and only if nnitalic_n is a losing position (because if nLn\in Litalic_n ∈ italic_L, then Alice’s first move must lead to a winning position for Bob). […]
我们正在寻找正整数 n20242024n\leq 2024italic_n ≤ 2024 的个数,对于这些数,Bob 有一个获胜策略。这意味着如果游戏从 nnitalic_n 个代币开始,Alice 先手,Bob 可以获胜。这只有在 nnitalic_n 是一个败局位置时才会发生(因为如果 nLn\in Litalic_n ∈ italic_L ,那么 Alice 的第一步必须导致 Bob 获得一个获胜位置)。 [...]
Let’s try to find a pattern in the losing positions L={0,2,5,7,10,12,15,17,20,22,}L=\{0,2,5,7,10,12,15,17,20,22,...\}italic_L = { 0 , 2 , 5 , 7 , 10 , 12 , 15 , 17 , 20 , 22 , … }. Differences between consecutive terms: 2,3,2,3,2,3,2,3,2,2,3,2,3,2,3,2,3,2,...2 , 3 , 2 , 3 , 2 , 3 , 2 , 3 , 2 , … The pattern of differences is alternating 2,32,32 , 3. […]
让我们尝试在败势位置 L={0,2,5,7,10,12,15,17,20,22,}0257101215172022L=\{0,2,5,7,10,12,15,17,20,22,...\}italic_L = { 0 , 2 , 5 , 7 , 10 , 12 , 15 , 17 , 20 , 22 , … } 中找到一个模式。相邻项之间的差异: 2,3,2,3,2,3,2,3,2,2323232322,3,2,3,2,3,2,3,2,...2 , 3 , 2 , 3 , 2 , 3 , 2 , 3 , 2 , … 差异的模式是交替的 2,3232,32 , 3 [...]
The losing positions are of the form 5m5m5 italic_m or 5m+25m+25 italic_m + 2, where m0m\geq 0italic_m ≥ 0. We need to find the number of positive integers n2024n\leq 2024italic_n ≤ 2024 that are in the set of losing positions LLitalic_L. […]
输掉的位置形式为 5m55m5 italic_m5m+2525m+25 italic_m + 2 ,其中 m00m\geq 0italic_m ≥ 0 。我们需要找到位于输掉位置集合 LLitalic_L 中的正整数 n20242024n\leq 2024italic_n ≤ 2024 的数量。[...]
Case 1: n=5mn=5mitalic_n = 5 italic_m, where m1m\geq 1italic_m ≥ 1. 15m20241\leq 5m\leq 20241 ≤ 5 italic_m ≤ 2024 1/5m2024/5=404.81/5\leq m\leq 2024/5=404.81 / 5 ≤ italic_m ≤ 2024 / 5 = 404.8 Since mmitalic_m is an integer, 1m4041\leq m\leq 4041 ≤ italic_m ≤ 404. The number of values of mmitalic_m is 404404404.
案例 1: n=5m5n=5mitalic_n = 5 italic_m ,其中 m11m\geq 1italic_m ≥ 1 . 15m20241520241\leq 5m\leq 20241 ≤ 5 italic_m ≤ 2024 1/5m2024/5=404.81520245404.81/5\leq m\leq 2024/5=404.81 / 5 ≤ italic_m ≤ 2024 / 5 = 404.8 由于 mmitalic_m 是整数, 1m40414041\leq m\leq 4041 ≤ italic_m ≤ 404 . mmitalic_m 的值有 404404404404 个。
Case 2: n=5m+2n=5m+2italic_n = 5 italic_m + 2, where m0m\geq 0italic_m ≥ 0. 5m+220245m+2\leq 20245 italic_m + 2 ≤ 2024 5m20225m\leq 20225 italic_m ≤ 2022 m2022/5=404.4m\leq 2022/5=404.4italic_m ≤ 2022 / 5 = 404.4 Since mmitalic_m is an integer, 0m4040\leq m\leq 4040 ≤ italic_m ≤ 404. The number of values of mmitalic_m is 4040+1=405404-0+1=405404 - 0 + 1 = 405.
案例 2: n=5m+252n=5m+2italic_n = 5 italic_m + 2 ,其中 m00m\geq 0italic_m ≥ 0 . 5m+220245220245m+2\leq 20245 italic_m + 2 ≤ 2024 5m2022520225m\leq 20225 italic_m ≤ 2022 m2022/5=404.420225404.4m\leq 2022/5=404.4italic_m ≤ 2022 / 5 = 404.4 由于 mmitalic_m 是整数, 0m40404040\leq m\leq 4040 ≤ italic_m ≤ 404 . mmitalic_m 的值有 4040+1=40540401405404-0+1=405404 - 0 + 1 = 405 个。
The total number of positive integers n2024n\leq 2024italic_n ≤ 2024 that are losing positions is 404+405=809404+405=\boxed{809}404 + 405 = start_ARG 809 end_ARG.
正整数中失去位置的整数总数为 404+405=809404405809404+405=\boxed{809}404 + 405 = start_ARG 809 end_ARG
[MATH500] Find the projection of aaitalic_a onto b=(263)b=\begin{pmatrix}2\\ 6\\ 3\end{pmatrix}italic_b = ( start_ARG start_ROW start_CELL 2 end_CELL end_ROW start_ROW start_CELL 6 end_CELL end_ROW start_ROW start_CELL 3 end_CELL end_ROW end_ARG ) if ab=8.a\cdot b=8.italic_a ⋅ italic_b = 8 .
找到向量 aaitalic_a 在向量 b=(263)matrix263b=\begin{pmatrix}2\\ 6\\ 3\end{pmatrix}italic_b = ( start_ARG start_ROW start_CELL 2 end_CELL end_ROW start_ROW start_CELL 6 end_CELL end_ROW start_ROW start_CELL 3 end_CELL end_ROW end_ARG ) 上的投影
The projection of vector 𝐚\mathbf{a}bold_a onto vector 𝐛\mathbf{b}bold_b is given by the formula: proj_b a = abb2 b We are given 𝐛=(263)\mathbf{b}=\begin{pmatrix}2\\ 6\\ 3\end{pmatrix}bold_b = ( start_ARG start_ROW start_CELL 2 end_CELL end_ROW start_ROW start_CELL 6 end_CELL end_ROW start_ROW start_CELL 3 end_CELL end_ROW end_ARG ) and 𝐚𝐛=8.\mathbf{a}\cdot\mathbf{b}=8.bold_a ⋅ bold_b = 8 .
向量 𝐚\mathbf{a}bold_a 在向量 𝐛\mathbf{b}bold_b 上的投影由以下公式给出:proj_b a = a ⋅ b / ∥b∥^2 b 已知向量 𝐛=(263)matrix263\mathbf{b}=\begin{pmatrix}2\\ 6\\ 3\end{pmatrix}bold_b = ( start_ARG start_ROW start_CELL 2 end_CELL end_ROW start_ROW start_CELL 6 end_CELL end_ROW start_ROW start_CELL 3 end_CELL end_ROW end_ARG ) 和向量 𝐚𝐛=8.8\mathbf{a}\cdot\mathbf{b}=8.bold_a ⋅ bold_b = 8 .
First, we need to find the magnitude squared of 𝐛\mathbf{b}bold_b, which is 𝐛2\|\mathbf{b}\|^{2}∥ bold_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. ∥b∥^2 = bb = 2^2 + 6^2 + 3^2 = 4 + 36 + 9 = 49.
首先,我们需要找到 𝐛\mathbf{b}bold_b 的平方模,即 𝐛2superscriptnorm2\|\mathbf{b}\|^{2}∥ bold_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 。∥b∥^2 = b ⋅ b = 2^2 + 6^2 + 3^2 = 4 + 36 + 9 = 49。
Now we can plug the values into the projection formula: proj_b a = 849 b Substitute the vector 𝐛\mathbf{b}bold_b: proj_b a = 849 (2 6 3 )
现在我们可以将数值代入投影公式:proj_b a = 849 b 将向量 𝐛\mathbf{b}bold_b 代入:proj_b a = 849 (2 6 3)
Multiply the scalar by each component of the vector: proj_b a = (849×2 849×6 849×3 ) = (164948492449)
将标量乘以向量的每个分量:proj_b a = (849×2 849×6 849×3 ) = (164948492449)
Final Answer: The final answer is (164948492449)( start_ARG start_ROW start_CELL divide start_ARG 16 end_ARG start_ARG 49 end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG 48 end_ARG start_ARG 49 end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG 24 end_ARG start_ARG 49 end_ARG end_CELL end_ROW end_ARG )
最终答案:最终答案为 (164948492449)matrix164948492449( start_ARG start_ROW start_CELL divide start_ARG 16 end_ARG start_ARG 49 end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG 48 end_ARG start_ARG 49 end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG 24 end_ARG start_ARG 49 end_ARG end_CELL end_ROW end_ARG )
[GPQA] A quantum mechanical particle of mass mmitalic_m moves in two dimensions in the following potential, as a function of the polar coordinates (r,θ)(r,\theta)( italic_r , italic_θ ): V(r, θ) = 12 k r^2 + 32 k r^2 cos^2(θ) Find the energy spectrum. Hint: Write the potential in Cartesian coordinates.
一个质量为 mmitalic_m 的量子力学粒子在以下势场中在二维空间运动,势场是极坐标 (r,θ)(r,\theta)( italic_r , italic_θ ) 的函数:V(r, θ) = 12 k r^2 + 32 k r^2 cos^2(θ) 求能级谱。提示:将势场写成笛卡尔坐标系。
The problem asks for the energy spectrum of a particle in a given 2D potential. The potential is given in polar coordinates (r,θ)(r,\theta)( italic_r , italic_θ ): V(r, θ) = 12 kr^2 + 32 kr^2 cos^2(θ). We can rewrite the potential as: V(r, θ) = 12 kr^2 (1 + 3cos^2(θ)). The Hamiltonian for a particle in 2D is given by: H = -ℏ22m ( ∂2∂x2 + ∂2∂y2 ) + V(x, y). In polar coordinates (r,θ)(r,\theta)( italic_r , italic_θ ), the Laplacian operator is: ∇^2 = 1r ∂∂r (r ∂∂r ) + 1r22θ2. So the Hamiltonian in polar coordinates is: H = -ℏ22m [ 1r ∂∂r (r ∂∂r ) + 1r22θ2 ] + V(r, θ). The time-independent Schrödinger equation is Hψ=EψH\psi=E\psiitalic_H italic_ψ = italic_E italic_ψ. We can try to separate variables by assuming a solution of the form ψ(r,θ)=R(r)Θ(θ)\psi(r,\theta)=R(r)\Theta(\theta)italic_ψ ( italic_r , italic_θ ) = italic_R ( italic_r ) roman_Θ ( italic_θ ). […]
问题要求求解给定二维势场中粒子的能量谱。势场以极坐标给出 (r,θ)(r,\theta)( italic_r , italic_θ ) :V(r, θ) = 12 kr^2 + 32 kr^2 cos^2(θ)。我们可以将势场重写为:V(r, θ) = 12 kr^2 (1 + 3cos^2(θ))。二维粒子的哈密顿量为:H = -ℏ^2/2m ( ∂^2/∂x^2 + ∂^2/∂y^2 ) + V(x, y)。在极坐标 (r,θ)(r,\theta)( italic_r , italic_θ ) 中,拉普拉斯算子为:∇^2 = 1/r ∂/∂r (r ∂/∂r) + 1/r^2 ∂^2/∂θ^2。因此,极坐标中的哈密顿量为:H = -ℏ^2/2m [ 1/r ∂/∂r (r ∂/∂r) + 1/r^2 ∂^2/∂θ^2 ] + V(r, θ)。定态薛定谔方程为 Hψ=EψH\psi=E\psiitalic_H italic_ψ = italic_E italic_ψ 。我们可以尝试通过假设解的形式 ψ(r,θ)=R(r)Θ(θ)\psi(r,\theta)=R(r)\Theta(\theta)italic_ψ ( italic_r , italic_θ ) = italic_R ( italic_r ) roman_Θ ( italic_θ ) 来分离变量。 [...]
This confirms the energy spectrum is: E = ℏkm (2n_x + n_y + 32). Final Answer: E=km(2nx+ny+32)E=\hbar\sqrt{\frac{k}{m}}\left(2n_{x}+n_{y}+\frac{3}{2}\right)italic_E = roman_ℏ square-root start_ARG divide start_ARG italic_k end_ARG start_ARG italic_m end_ARG end_ARG ( 2 italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT + divide start_ARG 3 end_ARG start_ARG 2 end_ARG ).
这证实了能量谱为:E = ℏkm (2n_x + n_y + 32)。最终答案: E=km(2nx+ny+32)Planck-constant-over-2-pi2subscriptsubscript32E=\hbar\sqrt{\frac{k}{m}}\left(2n_{x}+n_{y}+\frac{3}{2}\right)italic_E = roman_ℏ square-root start_ARG divide start_ARG italic_k end_ARG start_ARG italic_m end_ARG end_ARG ( 2 italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT + divide start_ARG 3 end_ARG start_ARG 2 end_ARG )
Figure 5: Example model outputs. We pick a question from AIME24 (left), MATH500 (middle), and GPQA (right), where our model generates the correct answer. The black text is the prompt, the light blue text is the reasoning trace, and the blue text is the answer of s1-32B. The gray ellipsis […] indicates that the text was trimmed to fit this page, but the generated text is actually longer.
图 5:示例模型输出。我们选取了 AIME24(左侧)、MATH500(中间)和 GPQA(右侧)中的一个问题,其中我们的模型生成了正确答案。黑色文字是提示,浅蓝色文字是推理轨迹,蓝色文字是 s1-32B 的答案。灰色省略号[...]表示文本已被裁剪以适应此页面,但实际上生成的文本更长。

5 Ablations  5Ablations 无法翻译

5.1 Data Quantity, Diversity, and Difficulty
5.1 数据量、多样性和难度

Table 2: s1K data ablations. We budget force (BF) a maximum of around 30,000 thinking tokens for all scores in this table. This performs slightly better than the scores without BF (LABEL:tab:perf) as it allows the model to finish with a best guess when stuck in an infinite loop. We report 95% paired bootstrap confidence intervals for differences relative to the s1K model using 10,000 bootstrap samples. E.g., the interval [-13%, 20%] means that, with 95% confidence, the true difference between 59K-full and s1K is between -13% and +20%. If the entire interval is negative, e.g. [-27%, -3%], we can confidently say that the performance is worse than s1K.
表 2:s1K 数据消融。我们为表中所有分数分配的最大力预算(BF)约为 30,000 个思考标记。这比没有 BF 的分数表现略好,因为它允许模型在陷入无限循环时完成最佳猜测。我们使用 10,000 个自举样本报告相对于 s1K 模型的 95%配对自举置信区间。例如,区间[-13%,20%]表示,在 95%的置信度下,59K-full 与 s1K 的真实差异在-13%到+20%之间。如果整个区间为负,例如[-27%,-3%],则可以自信地说性能比 s1K 差。
Model AIME 2024 MATH  数学 500 GPQA Diamond  钻石
1K-random  1K-随机 36.7 90.6 52.0
[-26.7%, -3.3%]  [-26.7%,-3.3%] [-4.8%, 0.0%]  [-4.8%,0.0%] [-12.6%, 2.5%]  [-12.6%,2.5%]
1K-diverse  1K-多样性 26.7 91.2 54.6
[-40.0%, -10.0%]  [-40.0%,-10.0%] [-4.0%, 0.2%]  [-4.0%,0.2%] [-10.1%, 5.1%]  [-10.1%,5.1%]
1K-longest  1K 最长 33.3 90.4 59.6
[-36.7%, 0.0%]  [-36.7%,0.0%] [-5.0%, -0.2%]  [-5.0%,-0.2%] [-5.1%, 10.1%]  [-5.1%,10.1%]
59K-full 53.3 92.8 58.1
[-13.3%, 20.0%]  [-13.3%,20.0%] [-2.6%, 2.2%]  [-2.6%,2.2%] [-6.6%, 8.6%]  [-6.6%,8.6%]
s1K 50.0 93.0 57.6

In §2 we outlined our three guiding principles in curating s1K: Quality, Difficulty, and Diversity. Here we test the importance of combining them and the overall efficacy of our selection. Only Quality (1K-random): After obtaining our high-quality reasoning chains from Gemini, we select 1,000 samples at random; not relying on our difficulty and diversity filtering at all. Table 2 shows this approach performs much worse than s1K across all benchmarks. Only Diversity (1K-diverse): For this dataset, we sample uniformly across domains to maximize diversity disregarding any notion of difficulty. This approach also leads to poor performance similar to 1K-random. Only Difficulty (1K-longest): Here we rely on one of our difficulty indicators introduced in §2 by selecting the 1,000 samples with the longest reasoning traces. This approach significantly boosts GPQA performance but overall still falls short of using s1K. Maximize Quantity: Finally, we compare with just training on all of our 59K samples, a superset of all the 1K-sample versions. This leads to a strong model but uses much more resources. To finetune on 59K samples, we use 394 H100 GPU hours while s1-32B only required 7 H100 GPU hours. Moreover, relying only on s1K is extremely competitive as shown in §2. Overall, combining all three criteria – Quality, Difficulty, Diversity – via our methodology in §2 is key for sample-efficient reasoning training.
在§2 中,我们概述了我们在 s1K 编纂中的三个指导原则:质量、难度和多样性。在这里,我们测试了将它们结合起来的重要性以及我们选择的总体有效性。仅质量(1K-random):在从 Gemini 获得我们高质量推理链后,我们随机选择 1,000 个样本;完全不依赖我们的难度和多样性过滤。表 2 显示,这种方法在所有基准测试中都比 s1K 表现差。仅多样性(1K-diverse):对于这个数据集,我们在各个领域内均匀采样以最大化多样性,不考虑任何难度概念。这种方法也导致性能较差,类似于 1K-random。仅难度(1K-longest):在这里,我们依靠我们在§2 中介绍的一个难度指标,选择具有最长推理轨迹的 1,000 个样本。这种方法显著提高了 GPQA 性能,但总体上仍然不如使用 s1K。最大化数量:最后,我们比较了仅训练我们的 59K 个样本的情况,这是所有 1K 样本版本的超集。这导致了一个强大的模型,但使用了更多的资源。 在 59K 个样本上进行微调,我们使用了 394 个 H100 GPU 小时,而 s1-32B 仅需 7 个 H100 GPU 小时。此外,仅依赖 s1K 在§2 中展示的竞争极为激烈。总的来说,通过第 2 节中的方法结合所有三个标准——质量、难度、多样性——对于样本高效的推理训练至关重要。

5.2 Test-time scaling methods
5.2 测试时缩放方法

Table 3: Ablations on methods to scale test-time compute on AIME24. |𝒜||\mathcal{A}|| caligraphic_A | refers to the number of evaluation runs used to estimate the properties; thus a higher value indicates more robustness. Bold indicates our chosen method and the best values. BF = budget forcing, TCC/SCC/CCC = token/step/class-conditional control, RS = rejection sampling.
表 3:在 AIME24 上扩展测试时计算的消融方法。 |𝒜||\mathcal{A}|| caligraphic_A | 指用于估计属性的评价运行次数;因此,更高的值表示更强的鲁棒性。粗体表示我们选择的方法和最佳值。BF = 预算强制,TCC/SCC/CCC = 令牌/步/类条件控制,RS = 拒绝采样。
Method  方法 Control  控制 Scaling  缩放 Performance  性能 |𝒜||\mathcal{A}|| caligraphic_A |
BF 100% 15 56.7 5
TCC 40% -24 40.0 5
TCC + BF 100% 13 40.0 5
SCC 60% 3 36.7 5
SCC + BF 100% 6 36.7 5
CCC 50% 25 36.7 2
RS 100% -35 40.0 5

Budget forcing  预算强制

In Table 3 we compare the test-time scaling methods we have introduced in §3. Overall, we find that budget forcing provides perfect control, good scaling, and leads to our best AIME24 score. Thus, this is the method we use for s1-32B in Figure 1 and in §4. In Table 4, we compare different strings for extrapolating performance. We find that “Wait” generally gives the best performance.
在表 3 中,我们比较了我们在第 3 节中介绍的计算时间缩放方法。总体而言,我们发现预算强制提供了完美的控制、良好的缩放,并导致了我们最佳的 AIME24 分数。因此,这是我们用于图 1 和第 4 节中 s1-32B 的方法。在表 4 中,我们比较了不同的字符串以预测性能。我们发现“等待”通常提供最佳性能。

Class-conditional control We provide benchmark scores for this method in §E.1 and summarize three findings here: (1) Token-conditional control fails without budget forcing, as our model cannot reliably count tokens - even when trained to do so. (2) Under step-conditional control, the model generates a similar total number of tokens when given different step targets, as the model goes from few steps with many tokens per step, to many steps with few tokens in each step. Thus, the model learns to hack its way around the compute constraint making the controllability of this method mediocre. (3) Class-conditional control can work - telling a model to simply think longer can increase its test-time compute and performance, which leads good scaling in Table 3.
类条件控制 我们在第 E.1 节中提供了此方法的基准分数,并在此总结三个发现:(1)在没有预算强制的情况下,标记条件控制失败,因为我们的模型无法可靠地计数标记——即使经过训练也是如此。(2)在步骤条件控制下,模型在给定不同的步骤目标时生成相似的总标记数,因为模型从每步许多标记的少量步骤,转变为每步少量标记的许多步骤。因此,模型学会了通过绕过计算约束来“黑客”其方式,使得此方法的可控性平庸。(3)类条件控制可以工作——告诉模型简单地思考更长可以增加其测试时的计算能力和性能,这在表 3 中导致了良好的扩展。

Table 4: Budget forcing extrapolation ablations. We compare ignoring the end-of-thinking delimiter twice and appending none or various strings.
表 4:预算强制外推消融。我们比较了两次忽略思考结束分隔符以及附加空字符串或各种字符串的情况。
Model AIME 2024 MATH  数学 500 GPQA Diamond  钻石
No extrapolation  无外推 50.0 93.0 57.6
2x without string  2x 无字符串 50.0 90.2 55.1
2x “Alternatively”  2x “或者” 50.0 92.2 59.6
2x “Hmm”  2x “嗯” 50.0 93.0 59.6
2x “Wait”  2x “等待” 53.3 93.0 59.6
Refer to caption
Figure 6: Rejection sampling on AIME24 with s1-32B. We sample with a temperature of 1 until all generations have less than (from left to right) 3500, 4000, 5000, 8000, and 16000 thinking tokens requiring an average of 655, 97, 8, 3, 2, and 1 tries per sample.
图 6:在 AIME24 上的拒绝采样,使用 s1-32B。以 1 的温度进行采样,直到所有生成物(从左到右)的 3500、4000、5000、8000 和 16000 个思维令牌都小于,平均每个样本需要尝试 655、97、8、3、2 和 1 次。

Rejection sampling  拒绝采样

Surprisingly, we find that simply sampling until the generation fits a specific length leads to an inverse scaling trend as depicted in Figure 6. In §E.2 we inspect a question, which was answered correctly by the model when rejection sampling for 4000\leq 4000≤ 4000, but not for the 8000\leq 8000≤ 8000 token setting. In the 4000\leq 4000≤ 4000 setting the model directly jumps to the correct approach, while for the 8000\leq 8000≤ 8000 setting it backtracks a lot. We hypothesize that there is a correlation such that shorter generations tend to be the ones where the model was on the right track from the start, whereas longer ones tend to be ones where the model made mistakes and thus backtracks or questions itself. This leads to longer samples often being wrong when rejection sampling and thus the inverse scaling trend.
令人惊讶的是,我们发现简单地采样直到生成文本符合特定长度,会导致如图 6 所示的逆比例趋势。在§E.2 中,我们检查了一个问题,当对 4000absent4000\leq 4000≤ 4000 进行拒绝采样时,模型正确回答了这个问题,但在 8000absent8000\leq 8000≤ 8000 标记设置下没有。在 4000absent4000\leq 4000≤ 4000 设置中,模型直接跳到正确的方法,而在 8000absent8000\leq 8000≤ 8000 设置中,它后退了很多。我们假设存在一种相关性,即较短的生成文本往往是在模型从一开始就正确的情况下产生的,而较长的生成文本往往是在模型犯错误并因此回溯或质疑自己的情况下产生的。这导致在拒绝采样时,较长的样本往往是不正确的,从而产生了逆比例趋势。

6 Discussion and related work
6 讨论及相关工作

6.1 Sample-efficient reasoning
6.1 样本高效推理

Models  模型

There are a number of concurrent efforts to build models that replicate the performance of o1 (OpenAI, 2024). For example, DeepSeek-r1 and k1.5 (DeepSeek-AI et al., 2025; Team et al., 2025) are built with reinforcement learning methods, while others rely on SFT using tens of thousands of distilled examples (Team, 2025; Xu et al., 2025; Labs, 2025). We show that SFT on only 1,000 examples suffices to build a competitive reasoning model matching o1-preview and produces a model that lies on the pareto frontier (Figure 2). Further, we introduce budget forcing which combined with our reasoning model leads to the first reproduction of OpenAI’s test-time scaling curves (OpenAI, 2024). Why does supervised finetuning on just 1,000 samples lead to such performance gains? We hypothesize that the model is already exposed to large amounts of reasoning data during pretraining which spans trillions of tokens. Thus, the ability to perform reasoning is already present in our model. Our sample-efficient finetuning stage just activates it and we scale it further at test time with budget forcing. This is similar to the ”Superficial Alignment Hypothesis” presented in LIMA (Zhou et al., 2023), where the authors find that 1,000 examples can be sufficient to align a model to adhere to user preferences.
存在许多同时进行的努力,旨在构建能够复制 o1(OpenAI,2024)性能的模型。例如,DeepSeek-r1 和 k1.5(DeepSeek-AI 等,2025;团队等,2025)是使用强化学习方法构建的,而其他模型则依赖于使用数万个蒸馏示例的 SFT(Team,2025;徐等,2025;实验室,2025)。我们表明,在仅 1,000 个示例上进行 SFT 就足以构建一个与 o1-preview 相匹配的竞争性推理模型,并产生一个位于帕累托前沿的模型(图 2)。进一步地,我们引入了预算强制,它与我们的推理模型相结合,导致了 OpenAI 测试时缩放曲线的第一次复制(OpenAI,2024)。为什么仅在 1,000 个样本上进行监督微调就能带来如此大的性能提升?我们假设,在预训练期间,模型已经接触到了大量跨越万亿个标记的推理数据。因此,执行推理的能力已经存在于我们的模型中。我们的样本高效微调阶段只是激活了它,并在测试时通过预算强制进一步扩展它。 这与 LIMA(周等,2023)中提出的“表面对齐假设”类似,其中作者发现 1000 个示例就足以将模型对齐以符合用户偏好。

Benchmarks and methods  基准和方法

To evaluate and push the limits of these models, increasingly challenging benchmarks have been introduced, such as Olympiad-level science competitions (He et al., 2024; Jain et al., 2024; Zhong et al., 2023) and others (Srivastava et al., 2023; Glazer et al., 2024; Su et al., 2024; Kim et al., 2024; Phan et al., 2025). To enhance models’ performance on reasoning-related tasks, researchers have pursued several strategies: Prior works have explored continuing training language models on specialized corpora related to mathematics and science (Azerbayev et al., 2023; Yang et al., 2024), sometimes even synthetically generated data (Yu et al., 2024). Others have developed training methodologies specifically aimed at reasoning performance (Zelikman et al., 2022, 2024; Luo et al., 2025; Yuan et al., 2025; Wu et al., 2024a). Another significant line of work focuses on prompting-based methods to elicit and improve reasoning abilities, including methods like Chain-of-Thought prompting (Wei et al., 2023; Yao et al., 2023a, b; Bi et al., 2023; Fu et al., 2023; Zhang et al., 2024b; Xiang et al., 2025; Hu et al., 2024; Diao et al., 2024). These combined efforts aim to advance the reasoning ability of language models, enabling them to handle more complex and abstract tasks effectively.
为了评估和推动这些模型的极限,引入了越来越具有挑战性的基准,例如奥林匹克级别的科学竞赛(He 等,2024;Jain 等,2024;Zhong 等,2023)以及其他(Srivastava 等,2023;Glazer 等,2024;Su 等,2024;Kim 等,2024;Phan 等,2025)。为了提高模型在推理相关任务上的性能,研究人员采取了多种策略:先前的研究探讨了在数学和科学相关的专业语料库上继续训练语言模型(Azerbayev 等,2023;Yang 等,2024),有时甚至是合成的数据(Yu 等,2024)。其他人开发了专门针对推理性能的训练方法(Zelikman 等,2022,2024;Luo 等,2025;Yuan 等,2025;Wu 等,2024a)。另一项重要的工作线集中在基于提示的方法来激发和改进推理能力,包括如思维链提示(Wei 等,2023;Yao 等,2023a,b;Bi 等,2023;Fu 等,2023;Zhang 等,2024b;Xiang 等,2025;Hu 等,2024;Diao 等,2024)等方法。 这些联合努力旨在提升语言模型的推理能力,使它们能够有效地处理更复杂和抽象的任务。

6.2 Test-time scaling  6.2 测试时缩放

Methods  方法

As we introduce in §3, we differentiate two methods to scale test-time compute: parallel and sequential. The former relies on multiple solution attempts generated in parallel and selecting the best outcome via specific criteria. These criteria include choosing the most frequent response for majority voting or the best response based on an external reward for Best-of-N (Brown et al., 2024; Irvine et al., 2023; Levi, 2024). Unlike repeated sampling, previous sequential scaling methods let the model generate solution attempts sequentially based on previous attempts, allowing it to refine each attempt based on previous outcomes (Snell et al., 2024; Hou et al., 2025; Lee et al., 2025). Tree-based search methods (Gandhi et al., 2024; Wu et al., 2024b) offer a hybrid approach between sequential and parallel scaling, such as Monte-Carlo Tree Search (MCTS) (Liu et al., 2024; Zhang et al., 2023; Zhou et al., 2024; Choi et al., 2023) and guided beam search (Xie et al., 2023). REBASE (Wu et al., 2024b) employs a process reward model to balance exploitation and pruning during tree search. Empirically, REBASE has been shown to outperform sampling-based methods and MCTS (Wu et al., 2024b). Reward models (Lightman et al., 2023; Wang et al., 2024b, c) play a key role in these methods. They come in two variants: outcome reward models and process reward models. Outcome reward models (Xin et al., 2024; Ankner et al., 2024) assign a score to complete solutions and are particularly useful in Best-of-N selection, while process reward models (Lightman et al., 2023; Wang et al., 2024b; Wu et al., 2024b) assess individual reasoning steps and are effective in guiding tree-based search methods.
如我们在第 3 节中所述,我们区分了两种测试时计算缩放的两种方法:并行和顺序。前者依赖于并行生成的多个解决方案尝试,并通过特定标准选择最佳结果。这些标准包括选择最频繁的响应进行多数投票或基于外部奖励的最佳响应(Brown 等人,2024;Irvine 等人,2023;Levi,2024)。与重复抽样不同,之前的顺序缩放方法允许模型根据之前的尝试顺序生成解决方案尝试,从而根据之前的成果细化每个尝试(Snell 等人,2024;Hou 等人,2025;Lee 等人,2025)。基于树的方法(Gandhi 等人,2024;Wu 等人,2024b)提供了顺序和并行缩放之间的混合方法,例如蒙特卡洛树搜索(MCTS)(Liu 等人,2024;Zhang 等人,2023;Zhou 等人,2024;Choi 等人,2023)和引导束搜索(Xie 等人,2023)。REBASE(Wu 等人,2024b)在树搜索过程中采用过程奖励模型来平衡利用和剪枝。经验上,REBASE 已被证明优于基于采样的方法和 MCTS(Wu 等人,2024b)。 奖励模型(Lightman 等人,2023;王等人,2024b, c)在这些方法中发挥着关键作用。它们有两种变体:结果奖励模型和过程奖励模型。结果奖励模型(辛等人,2024;安克纳等人,2024)为完整解决方案分配分数,在最佳选择 N 中特别有用,而过程奖励模型(Lightman 等人,2023;王等人,2024b;吴等人,2024b)评估单个推理步骤,在指导基于树的搜索方法中非常有效。

Limits to further test-time scaling
进一步测试时间缩放的限制

We have shown that budget forcing allows extrapolating test-time compute in §4, e.g., improving AIME24 performance from 50% to 57%. However, it has two key limitations when scaling further: it eventually flattens out (Figure 4), and the context window of the underlying language model constrains it. Despite these constraints, our work shows test-time scaling across a wide range of accuracies (Figure 1), partly because scaling down test-time compute behaves predictably and does not suffer from these constraints.
我们已经证明了预算强制可以在第 4 节中扩展测试时的计算,例如,将 AIME24 的性能从 50%提高到 57%。然而,当进一步扩展时,它有两个关键限制:最终会趋于平稳(见图 4),并且底层语言模型的上下文窗口对其进行了限制。尽管存在这些限制,我们的工作表明在广泛的准确性范围内实现了测试时的扩展(见图 1),部分原因是因为降低测试时的计算行为具有可预测性,并且不受这些限制的影响。

Continuing test-time scaling will require approaches that can further extrapolate test-time compute. How can we get such extrapolation? There may be improvements to budget forcing such as rotating through different strings, not only “Wait”, or combining it with frequency penalties or higher temperature to avoid repetitive loops. An exciting direction for future work is also researching whether applying budget forcing to a reasoning model trained with reinforcement learning yields better extrapolation; or if RL allows for new ways of test-time scaling beyond budget forcing. Our work defines the right metrics (§3.2) – Control, Scaling, and Performance – to enable future research and progress on extrapolating test-time compute.
持续进行测试时缩放需要能够进一步外推测试时计算的方法。我们如何获得这种外推?可能可以通过旋转不同的字符串来改进预算强制,而不仅仅是“等待”,或者将其与频率惩罚或更高温度结合,以避免重复循环。未来工作的一个令人兴奋的方向是研究将预算强制应用于用强化学习训练的推理模型是否会产生更好的外推;或者 RL 是否允许在预算强制之外的新方法进行测试时缩放。我们的工作定义了正确的指标(§3.2)——控制、缩放和性能——以使未来的研究在测试时计算外推方面取得进展。

Refer to caption
Figure 7: Scaling further with parallel scaling methods. All metrics averaged over the 30 questions in AIME24. Average thinking tokens for REBASE do not account for the additional compute from the reward model. For sequential scaling, we prompt the model to use up to (from left to right) 32, 64, 256, and 512 steps. For REBASE and majority voting we generate 16 parallel trajectories to aggregate across.
图 7:进一步扩展使用并行缩放方法。AIME24 中的 30 个问题的所有指标平均值。REBASE 的平均思考令牌不计入奖励模型的额外计算。对于顺序缩放,我们提示模型使用最多(从左到右)32、64、256 和 512 步。对于 REBASE 和多数投票,我们生成 16 个并行轨迹以进行聚合。

Parallel scaling as a solution
并行扩展作为解决方案

Parallel scaling offers one solution to the limits of sequential scaling, thus we augment our sequentially scaled model with two methods: (I) Majority voting: After generating kkitalic_k solutions, the final solution is the most frequent one across generations; (II) Tree search via REBASE: We use the REBASE process reward model, which is initialized from LLaMA-34B and further finetuned on a synthetic process reward modeling dataset (Wu et al., 2024b). We then aggregate the solutions generated by REBASE via majority voting. As shown in Figure 7, augmenting our model with REBASE scales better than majority voting, and even sequential scaling in this scenario. However, REBASE requires an additional forward pass at each step for the reward model adding some computation overhead. For sequential scaling, when prompted to use up to 512 steps, for 12 out of the 30 evaluation questions the model generates a response that exceeds the context window leading to a large performance drop. Overall, we find that these parallel scaling methods complement sequential scaling thus they offer an avenue for scaling test-time compute even further; beyond fixed context windows.
并行扩展提供了一种解决顺序扩展限制的方法,因此我们通过两种方法增强我们的顺序扩展模型:(I)多数投票:在生成 kkitalic_k 解决方案后,最终解决方案是跨代中最频繁的一个;(II)通过 REBASE 进行树搜索:我们使用 REBASE 过程奖励模型,该模型从 LLaMA-34B 初始化,并在合成过程奖励建模数据集(Wu 等人,2024b)上进行进一步微调。然后我们通过多数投票聚合 REBASE 生成的解决方案。如图 7 所示,将 REBASE 添加到我们的模型中比多数投票和在此场景中的顺序扩展具有更好的扩展性。然而,REBASE 需要在每个步骤进行额外的正向传递以添加奖励模型,这增加了计算开销。对于顺序扩展,当被提示使用最多 512 个步骤时,对于 30 个评估问题中的 12 个,模型生成的响应超出了上下文窗口,导致性能大幅下降。总的来说,我们发现这些并行扩展方法补充了顺序扩展,因此它们为扩展测试时计算提供了途径;超越固定上下文窗口。

Impact Statement  影响声明

Language models with strong reasoning capabilities have the potential to greatly enhance human productivity, from assisting in complex decision-making to driving scientific breakthroughs. However, recent advances in reasoning, such as OpenAI’s o1 and DeepSeek’s r1, lack transparency, limiting broader research progress. Our work aims to push the frontier of reasoning in a fully open manner, fostering innovation and collaboration to accelerate advancements that ultimately benefit society.
具有强大推理能力的语言模型有望极大地提高人类生产力,从协助复杂决策到推动科学突破。然而,推理方面的最新进展,如 OpenAI 的 o1 和 DeepSeek 的 r1,缺乏透明度,限制了更广泛的研究进展。我们的工作旨在以完全开放的方式推动推理的前沿,促进创新和合作,以加速最终造福社会的进步。

Acknowledgements  致谢

We thank Ryan Marten for generating traces from DeepSeek r1 for s1.1 using Bespoke Curator (Marten et al., 2025). This work was partly conducted using the Stanford Marlowe GPU cluster (Kapfer et al., 2025), made possible by financial support from Stanford University. We thank Alexander M. Rush, Andrew Ilyas, Banghua Zhu, Chenglei Si, Chunting Zhou, John Yang, Ludwig Schmidt, Samy Jelassi, Suhas Kotha, Tengyu Ma, Xuechen Li, Yu Sun, and Yue Zhang for very constructive discussions.
我们感谢 Ryan Marten 使用 Bespoke Curator(Marten 等人,2025)从 DeepSeek r1 为 s1.1 生成跟踪。这项工作部分是在斯坦福 Marlowe GPU 集群(Kapfer 等人,2025)上进行的,该集群得到了斯坦福大学的资金支持。我们感谢 Alexander M. Rush、Andrew Ilyas、Banghua Zhu、Chenglei Si、Chunting Zhou、John Yang、Ludwig Schmidt、Samy Jelassi、Suhas Kotha、Tengyu Ma、Xuechen Li、Yu Sun 和 Yue Zhang 就非常有益的讨论。

References  参考文献

  • Ankner et al. (2024)  安克纳等(2024 年) Ankner, Z., Paul, M., Cui, B., Chang, J. D., and Ammanabrolu, P. Critique-out-loud reward models, 2024. URL https://arxiv.org/abs/2408.11791.
    Ankner, Z., Paul, M., Cui, B., Chang, J. D., and Ammanabrolu, P. 大声批评奖励模型,2024。URL https://arxiv.org/abs/2408.11791。
  • Arora et al. (2023)  阿罗拉等人(2023 年) Arora, D., Singh, H. G., and Mausam. Have llms advanced enough? a challenging problem solving benchmark for large language models, 2023. URL https://arxiv.org/abs/2305.15074.
    阿罗拉,D.,辛格,H.G.,和马乌萨姆。llms是否足够先进?大型语言模型的一个具有挑战性的问题解决基准,2023。URL https://arxiv.org/abs/2305.15074。
  • Azerbayev et al. (2023)  阿泽尔拜夫等人(2023 年) Azerbayev, Z., Schoelkopf, H., Paster, K., Santos, M. D., McAleer, S., Jiang, A. Q., Deng, J., Biderman, S., and Welleck, S. Llemma: An open language model for mathematics, 2023.
    阿泽尔拜夫,Z.,舍尔科普夫,H.,帕斯特,K.,桑托斯,M. D.,麦克利尔,S.,江,A. Q.,邓,J.,比德曼,S.,韦莱克,S. L.:Llemma:一个用于数学的开放语言模型,2023 年。
  • Bi et al. (2023)  Bi 等(2023 年) Bi, Z., Zhang, N., Jiang, Y., Deng, S., Zheng, G., and Chen, H. When do program-of-thoughts work for reasoning?, 2023. URL https://arxiv.org/abs/2308.15452.
    思维程序何时适用于推理?2023。URL https://arxiv.org/abs/2308.15452
  • Biderman et al. (2024)  比德曼等人(2024 年) Biderman, S., Schoelkopf, H., Sutawika, L., Gao, L., Tow, J., Abbasi, B., Aji, A. F., Ammanamanchi, P. S., Black, S., Clive, J., DiPofi, A., Etxaniz, J., Fattori, B., Forde, J. Z., Foster, C., Hsu, J., Jaiswal, M., Lee, W. Y., Li, H., Lovering, C., Muennighoff, N., Pavlick, E., Phang, J., Skowron, A., Tan, S., Tang, X., Wang, K. A., Winata, G. I., Yvon, F., and Zou, A. Lessons from the trenches on reproducible evaluation of language models, 2024.
    比德曼,S.,舍尔科普夫,H.,苏塔维卡,L.,高,L.,托,J.,阿巴斯,B.,阿吉,A. F.,阿曼纳曼奇,P. S.,布莱克,S.,克莱夫,J.,迪波菲,A.,埃查尼兹,J.,法托里,B.,福德,J. Z.,福斯特,C.,胡斯,J.,贾斯瓦尔,M.,李,W. Y.,李,H.,洛弗林,C.,穆尼希霍夫,N.,帕夫利克,E.,方,J.,斯科罗恩,A.,谭,S.,唐,X.,王,K. A.,温纳塔,G. I.,于冯,F.,和邹,A. 从战场上的经验谈语言模型的可重复性评估,2024。
  • Brown et al. (2024)  Brown 等人(2024 年) Brown, B., Juravsky, J., Ehrlich, R., Clark, R., Le, Q. V., Ré, C., and Mirhoseini, A. Large language monkeys: Scaling inference compute with repeated sampling, 2024. URL https://arxiv.org/abs/2407.21787.
    布朗,B.,尤拉夫斯基,J.,厄利希,R.,克拉克,R.,黎,Q. V.,雷,C.,和米尔霍塞尼,A. 大语言猴子:通过重复采样扩展推理计算,2024。URL https://arxiv.org/abs/2407.21787。
  • Cesista (2024) Cesista, F. L. Multimodal structured generation: Cvpr’s 2nd mmfm challenge technical report, 2024. URL https://arxiv.org/abs/2406.11403.
    Cesista, F. L. 多模态结构化生成:Cvpr 的第二届 mmfm 挑战技术报告,2024。URL https://arxiv.org/abs/2406.11403。
  • Chen et al. (2023)  陈等(2023) Chen, W., Yin, M., Ku, M., Lu, P., Wan, Y., Ma, X., Xu, J., Wang, X., and Xia, T. Theoremqa: A theorem-driven question answering dataset, 2023. URL https://arxiv.org/abs/2305.12524.
    陈,W.,尹,M.,库,M.,卢,P.,万,Y.,马,X.,徐,J.,王,X.,夏,T. 《Theoremqa:一个以定理驱动的问答数据集》,2023。URL https://arxiv.org/abs/2305.12524。
  • Choi et al. (2023)  崔等. (2023) Choi, S., Fang, T., Wang, Z., and Song, Y. Kcts: Knowledge-constrained tree search decoding with token-level hallucination detection, 2023. URL https://arxiv.org/abs/2310.09044.
    Choi, S., Fang, T., Wang, Z. 和 Song, Y. KCTS:基于知识约束的树搜索解码与 token 级幻觉检测,2023。URL https://arxiv.org/abs/2310.09044。
  • DeepSeek-AI et al. (2025)
    DeepSeek-AI 等人(2025 年)
    DeepSeek-AI, Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., Zhu, Q., Ma, S., Wang, P., Bi, X., Zhang, X., Yu, X., Wu, Y., Wu, Z. F., Gou, Z., Shao, Z., Li, Z., Gao, Z., Liu, A., Xue, B., Wang, B., Wu, B., Feng, B., Lu, C., Zhao, C., Deng, C., Zhang, C., Ruan, C., Dai, D., Chen, D., Ji, D., Li, E., Lin, F., Dai, F., Luo, F., Hao, G., Chen, G., Li, G., Zhang, H., Bao, H., Xu, H., Wang, H., Ding, H., Xin, H., Gao, H., Qu, H., Li, H., Guo, J., Li, J., Wang, J., Chen, J., Yuan, J., Qiu, J., Li, J., Cai, J. L., Ni, J., Liang, J., Chen, J., Dong, K., Hu, K., Gao, K., Guan, K., Huang, K., Yu, K., Wang, L., Zhang, L., Zhao, L., Wang, L., Zhang, L., Xu, L., Xia, L., Zhang, M., Zhang, M., Tang, M., Li, M., Wang, M., Li, M., Tian, N., Huang, P., Zhang, P., Wang, Q., Chen, Q., Du, Q., Ge, R., Zhang, R., Pan, R., Wang, R., Chen, R. J., Jin, R. L., Chen, R., Lu, S., Zhou, S., Chen, S., Ye, S., Wang, S., Yu, S., Zhou, S., Pan, S., Li, S. S., Zhou, S., Wu, S., Ye, S., Yun, T., Pei, T., Sun, T., Wang, T., Zeng, W., Zhao, W., Liu, W., Liang, W., Gao, W., Yu, W., Zhang, W., Xiao, W. L., An, W., Liu, X., Wang, X., Chen, X., Nie, X., Cheng, X., Liu, X., Xie, X., Liu, X., Yang, X., Li, X., Su, X., Lin, X., Li, X. Q., Jin, X., Shen, X., Chen, X., Sun, X., Wang, X., Song, X., Zhou, X., Wang, X., Shan, X., Li, Y. K., Wang, Y. Q., Wei, Y. X., Zhang, Y., Xu, Y., Li, Y., Zhao, Y., Sun, Y., Wang, Y., Yu, Y., Zhang, Y., Shi, Y., Xiong, Y., He, Y., Piao, Y., Wang, Y., Tan, Y., Ma, Y., Liu, Y., Guo, Y., Ou, Y., Wang, Y., Gong, Y., Zou, Y., He, Y., Xiong, Y., Luo, Y., You, Y., Liu, Y., Zhou, Y., Zhu, Y. X., Xu, Y., Huang, Y., Li, Y., Zheng, Y., Zhu, Y., Ma, Y., Tang, Y., Zha, Y., Yan, Y., Ren, Z. Z., Ren, Z., Sha, Z., Fu, Z., Xu, Z., Xie, Z., Zhang, Z., Hao, Z., Ma, Z., Yan, Z., Wu, Z., Gu, Z., Zhu, Z., Liu, Z., Li, Z., Xie, Z., Song, Z., Pan, Z., Huang, Z., Xu, Z., Zhang, Z., and Zhang, Z. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning, 2025. URL https://arxiv.org/abs/2501.12948.
    DeepSeek-AI, 国,D.,杨,D.,张,H.,宋,J.,张,R.,徐,R.,朱,Q.,马,S.,王,P.,毕,X.,张,X.,余,X.,吴,Y.,吴,Z. F.,顾,Z.,邵,Z.,李,Z.,高,Z.,刘,A.,薛,B.,王,B.,吴,B.,冯,B.,陆,C.,赵,C.,邓,C.,张,C.,阮,C.,戴,D.,陈,D.,季,D.,李,E.,林,F.,戴,F.,罗,F.,郝,G.,陈,G.,李,G.,张,H.,包,H.,徐,H.,王,H.,丁,H.,辛,H.,高,H.,曲,H.,李,H.,郭,J.,李,J.,王,J.,陈,J.,袁,J.,邱,J.,李,J.,蔡,J. L.,倪,J.,梁,J.,陈,J.,董,K.,胡,K.,高,K.,关,K.,黄,K.,余,K.,王,L.,张,L.,赵,L.,王,L.,张,L.,徐,L.,夏,L.,张,M.,张,M.,唐,M.,李,M.,王,M.,李,M.,田,N.,黄,P.,张,P.,王,Q.,陈,Q.,杜,Q.,葛,R.,张,R.,潘,R.,王,R.,陈,R. J.,金,R. L.,陈,R.,卢,S.,周,S.,陈,S.,叶,S.,王,S.,余,S.,周,S.,潘,S.,李,S. S.,周,S.,吴,S.,叶,S.,云,T.,裴,T.,孙,T.,王,T.,曾,W.,赵,W.,刘,W.,梁,W.,高,W.,余,W.,张,W.,肖,W. L., An, W., 刘, X., 王晓, X., 陈, X., 邢, X., 成, X., 刘, X., 谢晓, X., 刘, X., 杨, X., 李, X., 苏, X., 林, X., 李, X. Q., 金, X., 沈晓, X., 陈, X., 孙, X., 王晓, X., 宋, X., 周晓, X., 王晓, X., 山晓, X., 李, Y. K., 王晓, Y. Q., 魏晓, Y. X., 张晓, Y., 徐晓, Y., 李, Y., 赵晓, Y., 孙晓, Y., 王晓, Y., 余晓, Y., 张晓, Y., 石晓, Y., 胸晓, Y., 何晓, Y., 蔡晓, Y., 王晓, Y., 谭晓, Y., 马晓, Y., 刘晓, Y., 郭晓, Y., 欧晓, Y., 王晓, Y., 龚晓, Y., 邹晓, Y., 何晓, Y., 胸晓, Y., 罗晓, Y., 有晓, Y., 刘晓, Y., 周晓, Y., 朱晓, Y. X., 徐晓, Y., 黄晓, Y., 李晓, Y., 郑晓, Y., 朱晓, Y., 马晓, Y., 唐晓, Y., 赵晓, Y., 颜晓, Y., 任志, Z. Z., 任, Z., 沙晓, Z., 付晓, Z., 徐晓, Z., 谢晓, Z., 张晓, Z., 郝晓, Z., 马晓, Z., 艳晓, Z., 吴晓, Z., 郭晓, Z., 朱晓, Z., 刘晓, Z., 李晓, Z., 谢晓, Z., 宋晓, Z., 潘晓, Z., 黄晓, Z., 徐晓, Z., 张晓, Z., 和晓, Z. Deepseek-r1:通过强化学习激励llms中的推理能力,2025。URL https://arxiv.org/abs/2501.12948.
  • Diao et al. (2024)  Diao 等人(2024 年) Diao, S., Wang, P., Lin, Y., Pan, R., Liu, X., and Zhang, T. Active prompting with chain-of-thought for large language models, 2024. URL https://arxiv.org/abs/2302.12246.
    Diao, S., Wang, P., Lin, Y., Pan, R., Liu, X., 和 Zhang, T. 基于思维链的大语言模型主动提示,2024。URL https://arxiv.org/abs/2302.12246。
  • Dubey et al. (2024)  杜贝等(2024 年) Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., Goyal, A., Hartshorn, A., Yang, A., Mitra, A., Sravankumar, A., Korenev, A., Hinsvark, A., Rao, A., Zhang, A., Rodriguez, A., Gregerson, A., et al. The llama 3 herd of models, 2024. URL https://arxiv.org/abs/2407.21783.
    Dubey, A.,Jauhri, A.,Pandey, A.,Kadian, A.,Al-Dahle, A.,Letman, A.,Mathur, A.,Schelten, A.,Yang, A.,Fan, A.,Goyal, A.,Hartshorn, A.,Yang, A.,Mitra, A.,Sravankumar, A.,Korenev, A.,Hinsvark, A.,Rao, A.,Zhang, A.,Rodriguez, A.,Gregerson, A.,等. Llama 3 模型群,2024。URL https://arxiv.org/abs/2407.21783。
  • Fu et al. (2023) Fu, Y., Peng, H., Sabharwal, A., Clark, P., and Khot, T. Complexity-based prompting for multi-step reasoning, 2023. URL https://arxiv.org/abs/2210.00720.
  • Gandhi et al. (2024) Gandhi, K., Lee, D., Grand, G., Liu, M., Cheng, W., Sharma, A., and Goodman, N. D. Stream of search (sos): Learning to search in language, 2024. URL https://arxiv.org/abs/2404.03683.
  • Gao et al. (2024a) Gao, B., Song, F., Yang, Z., Cai, Z., Miao, Y., Dong, Q., Li, L., Ma, C., Chen, L., Xu, R., Tang, Z., Wang, B., Zan, D., Quan, S., Zhang, G., Sha, L., Zhang, Y., Ren, X., Liu, T., and Chang, B. Omni-math: A universal olympiad level mathematic benchmark for large language models, 2024a. URL https://arxiv.org/abs/2410.07985.
  • Gao et al. (2021) Gao, L., Tow, J., Biderman, S., Black, S., DiPofi, A., Foster, C., Golding, L., Hsu, J., McDonell, K., Muennighoff, N., Phang, J., Reynolds, L., Tang, E., Thite, A., Wang, B., Wang, K., and Zou, A. A framework for few-shot language model evaluation, September 2021. URL https://doi.org/10.5281/zenodo.5371628.
  • Gao et al. (2024b) Gao, Z., Niu, B., He, X., Xu, H., Liu, H., Liu, A., Hu, X., and Wen, L. Interpretable contrastive monte carlo tree search reasoning, 2024b. URL https://arxiv.org/abs/2410.01707.
  • Glazer et al. (2024) Glazer, E., Erdil, E., Besiroglu, T., Chicharro, D., Chen, E., Gunning, A., Olsson, C. F., Denain, J.-S., Ho, A., de Oliveira Santos, E., Järviniemi, O., Barnett, M., Sandler, R., Vrzala, M., Sevilla, J., Ren, Q., Pratt, E., Levine, L., Barkley, G., Stewart, N., Grechuk, B., Grechuk, T., Enugandla, S. V., and Wildon, M. Frontiermath: A benchmark for evaluating advanced mathematical reasoning in ai, 2024. URL https://arxiv.org/abs/2411.04872.
  • Google (2024) Google. Gemini 2.0 flash thinking mode (gemini-2.0-flash-thinking-exp-1219), December 2024. URL https://cloud.google.com/vertex-ai/generative-ai/docs/thinking-mode.
  • Groeneveld et al. (2024) Groeneveld, D., Beltagy, I., Walsh, P., Bhagia, A., Kinney, R., Tafjord, O., Jha, A. H., Ivison, H., Magnusson, I., Wang, Y., Arora, S., Atkinson, D., Authur, R., Chandu, K. R., Cohan, A., Dumas, J., Elazar, Y., Gu, Y., Hessel, J., Khot, T., Merrill, W., Morrison, J., Muennighoff, N., Naik, A., Nam, C., Peters, M. E., Pyatkin, V., Ravichander, A., Schwenk, D., Shah, S., Smith, W., Strubell, E., Subramani, N., Wortsman, M., Dasigi, P., Lambert, N., Richardson, K., Zettlemoyer, L., Dodge, J., Lo, K., Soldaini, L., Smith, N. A., and Hajishirzi, H. Olmo: Accelerating the science of language models, 2024.
  • He et al. (2024) He, C., Luo, R., Bai, Y., Hu, S., Thai, Z. L., Shen, J., Hu, J., Han, X., Huang, Y., Zhang, Y., Liu, J., Qi, L., Liu, Z., and Sun, M. Olympiadbench: A challenging benchmark for promoting agi with olympiad-level bilingual multimodal scientific problems, 2024. URL https://arxiv.org/abs/2402.14008.
  • Hendrycks et al. (2021) Hendrycks, D., Burns, C., Kadavath, S., Arora, A., Basart, S., Tang, E., Song, D., and Steinhardt, J. Measuring mathematical problem solving with the math dataset, 2021. URL https://arxiv.org/abs/2103.03874.
  • Hoffmann et al. (2022) Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., de Las Casas, D., Hendricks, L. A., Welbl, J., Clark, A., Hennigan, T., Noland, E., Millican, K., van den Driessche, G., Damoc, B., Guy, A., Osindero, S., Simonyan, K., Elsen, E., Rae, J. W., Vinyals, O., and Sifre, L. Training compute-optimal large language models, 2022. URL https://arxiv.org/abs/2203.15556.
  • Hou et al. (2025) Hou, Z., Lv, X., Lu, R., Zhang, J., Li, Y., Yao, Z., Li, J., Tang, J., and Dong, Y. Advancing language model reasoning through reinforcement learning and inference scaling, 2025. URL https://arxiv.org/abs/2501.11651.
  • Hu et al. (2024) Hu, Y., Shi, W., Fu, X., Roth, D., Ostendorf, M., Zettlemoyer, L., Smith, N. A., and Krishna, R. Visual sketchpad: Sketching as a visual chain of thought for multimodal language models, 2024. URL https://arxiv.org/abs/2406.09403.
  • Huang et al. (2024a) Huang, Z., Wang, Z., Xia, S., Li, X., Zou, H., Xu, R., Fan, R.-Z., Ye, L., Chern, E., Ye, Y., Zhang, Y., Yang, Y., Wu, T., Wang, B., Sun, S., Xiao, Y., Li, Y., Zhou, F., Chern, S., Qin, Y., Ma, Y., Su, J., Liu, Y., Zheng, Y., Zhang, S., Lin, D., Qiao, Y., and Liu, P. Olympicarena: Benchmarking multi-discipline cognitive reasoning for superintelligent ai, 2024a. URL https://arxiv.org/abs/2406.12753.
  • Huang et al. (2024b) Huang, Z., Zou, H., Li, X., Liu, Y., Zheng, Y., Chern, E., Xia, S., Qin, Y., Yuan, W., and Liu, P. O1 replication journey – part 2: Surpassing o1-preview through simple distillation, big progress or bitter lesson?, 2024b. URL https://arxiv.org/abs/2411.16489.
  • Huang et al. (2025) Huang, Z., Geng, G., Hua, S., Huang, Z., Zou, H., Zhang, S., Liu, P., and Zhang, X. O1 replication journey – part 3: Inference-time scaling for medical reasoning, 2025. URL https://arxiv.org/abs/2501.06458.
  • Irvine et al. (2023) Irvine, R., Boubert, D., Raina, V., Liusie, A., Zhu, Z., Mudupalli, V., Korshuk, A., Liu, Z., Cremer, F., Assassi, V., Beauchamp, C.-C., Lu, X., Rialan, T., and Beauchamp, W. Rewarding chatbots for real-world engagement with millions of users, 2023. URL https://arxiv.org/abs/2303.06135.
  • Jain et al. (2024) Jain, N., Han, K., Gu, A., Li, W.-D., Yan, F., Zhang, T., Wang, S., Solar-Lezama, A., Sen, K., and Stoica, I. Livecodebench: Holistic and contamination free evaluation of large language models for code, 2024. URL https://arxiv.org/abs/2403.07974.
  • Kapfer et al. (2025) Kapfer, C., Stine, K., Narasimhan, B., Mentzel, C., and Candes, E. Marlowe: Stanford’s gpu-based computational instrument, January 2025. URL https://doi.org/10.5281/zenodo.14751899.
  • Kaplan et al. (2020) Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., and Amodei, D. Scaling laws for neural language models, 2020. URL https://arxiv.org/abs/2001.08361.
  • Kim et al. (2024) Kim, E., Suk, J., Kim, S., Muennighoff, N., Kim, D., and Oh, A. Llm-as-an-interviewer: Beyond static testing through dynamic llm evaluation, 2024. URL https://arxiv.org/abs/2412.10424.
  • Kwon et al. (2023) Kwon, W., Li, Z., Zhuang, S., Sheng, Y., Zheng, L., Yu, C. H., Gonzalez, J. E., Zhang, H., and Stoica, I. Efficient memory management for large language model serving with pagedattention, 2023. URL https://arxiv.org/abs/2309.06180.
  • Labs (2025) Labs, B. Bespoke-stratos: The unreasonable effectiveness of reasoning distillation, 2025. URL https://hf.co/bespokelabs/Bespoke-Stratos-32B. Accessed: 2025-01-22.
  • Lee et al. (2025) Lee, K.-H., Fischer, I., Wu, Y.-H., Marwood, D., Baluja, S., Schuurmans, D., and Chen, X. Evolving deeper llm thinking, 2025. URL https://arxiv.org/abs/2501.09891.
  • Levi (2024) Levi, N. A simple model of inference scaling laws, 2024. URL https://arxiv.org/abs/2410.16377.
  • LI et al. (2024) LI, J., Beeching, E., Tunstall, L., Lipkin, B., Soletskyi, R., Huang, S. C., Rasul, K., Yu, L., Jiang, A., Shen, Z., Qin, Z., Dong, B., Zhou, L., Fleureau, Y., Lample, G., and Polu, S. Numinamath, 2024. URL https://github.com/project-numina/aimo-progress-prize/blob/main/report/numina_dataset.pdf.
  • Lightman et al. (2023) Lightman, H., Kosaraju, V., Burda, Y., Edwards, H., Baker, B., Lee, T., Leike, J., Schulman, J., Sutskever, I., and Cobbe, K. Let’s verify step by step, 2023. URL https://arxiv.org/abs/2305.20050.
  • Ling et al. (2017) Ling, W., Yogatama, D., Dyer, C., and Blunsom, P. Program induction by rationale generation : Learning to solve and explain algebraic word problems, 2017. URL https://arxiv.org/abs/1705.04146.
  • Liu et al. (2020) Liu, J., Cui, L., Liu, H., Huang, D., Wang, Y., and Zhang, Y. Logiqa: A challenge dataset for machine reading comprehension with logical reasoning, 2020. URL https://arxiv.org/abs/2007.08124.
  • Liu et al. (2024) Liu, J., Cohen, A., Pasunuru, R., Choi, Y., Hajishirzi, H., and Celikyilmaz, A. Don’t throw away your value model! generating more preferable text with value-guided monte-carlo tree search decoding, 2024. URL https://arxiv.org/abs/2309.15028.
  • Loshchilov & Hutter (2019) Loshchilov, I. and Hutter, F. Decoupled weight decay regularization, 2019.
  • Luo et al. (2025) Luo, H., Sun, Q., Xu, C., Zhao, P., Lou, J., Tao, C., Geng, X., Lin, Q., Chen, S., Tang, Y., and Zhang, D. Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct, 2025. URL https://arxiv.org/abs/2308.09583.
  • Marten et al. (2025) Marten, R., Vu, T., Ji, C. C.-J., Sharma, K., Pimpalgaonkar, S., Dimakis, A., and Sathiamoorthy, M. Curator: A tool for synthetic data creation. https://github.com/bespokelabsai/curator, January 2025.
  • Muennighoff et al. (2024) Muennighoff, N., Soldaini, L., Groeneveld, D., Lo, K., Morrison, J., Min, S., Shi, W., Walsh, P., Tafjord, O., Lambert, N., Gu, Y., Arora, S., Bhagia, A., Schwenk, D., Wadden, D., Wettig, A., Hui, B., Dettmers, T., Kiela, D., Farhadi, A., Smith, N. A., Koh, P. W., Singh, A., and Hajishirzi, H. Olmoe: Open mixture-of-experts language models, 2024. URL https://arxiv.org/abs/2409.02060.
  • of America (2024) of America, M. A. Aime, February 2024. URL https://artofproblemsolving.com/wiki/index.php/AIME_Problems_and_Solutions/.
  • OpenAI (2024) OpenAI. Learning to reason with llms, September 2024. URL https://openai.com/index/learning-to-reason-with-llms/.
  • OpenAI (2025) OpenAI. Openai o3-mini, 2025. URL https://openai.com/index/openai-o3-mini/. Accessed: 2025-02-24.
  • Phan et al. (2025) Phan, L., Gatti, A., Han, Z., Li, N., Hu, J., Zhang, H., Shi, S., Choi, M., Agrawal, A., Chopra, A., et al. Humanity’s last exam, 2025. URL https://arxiv.org/abs/2501.14249.
  • Qin et al. (2024) Qin, Y., Li, X., Zou, H., Liu, Y., Xia, S., Huang, Z., Ye, Y., Yuan, W., Liu, H., Li, Y., and Liu, P. O1 replication journey: A strategic progress report – part 1, 2024. URL https://arxiv.org/abs/2410.18982.
  • Qwen et al. (2024) Qwen, :, Yang, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Li, C., Liu, D., Huang, F., Wei, H., Lin, H., Yang, J., Tu, J., Zhang, J., Yang, J., Yang, J., Zhou, J., Lin, J., Dang, K., Lu, K., Bao, K., Yang, K., Yu, L., Li, M., Xue, M., Zhang, P., Zhu, Q., Men, R., Lin, R., Li, T., Xia, T., Ren, X., Ren, X., Fan, Y., Su, Y., Zhang, Y., Wan, Y., Liu, Y., Cui, Z., Zhang, Z., and Qiu, Z. Qwen2.5 technical report, 2024. URL https://arxiv.org/abs/2412.15115.
  • Rein et al. (2023) Rein, D., Hou, B. L., Stickland, A. C., Petty, J., Pang, R. Y., Dirani, J., Michael, J., and Bowman, S. R. Gpqa: A graduate-level google-proof q&a benchmark, 2023. URL https://arxiv.org/abs/2311.12022.
  • Shi et al. (2024) Shi, Q., Tang, M., Narasimhan, K., and Yao, S. Can language models solve olympiad programming?, 2024. URL https://arxiv.org/abs/2404.10952.
  • Snell et al. (2024) Snell, C., Lee, J., Xu, K., and Kumar, A. Scaling llm test-time compute optimally can be more effective than scaling model parameters, 2024. URL https://arxiv.org/abs/2408.03314.
  • Srivastava et al. (2023) Srivastava, A., Rastogi, A., Rao, A., Shoeb, A. A. M., Abid, A., Fisch, A., Brown, A. R., Santoro, A., Gupta, A., Garriga-Alonso, A., et al. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models, 2023.
  • Su et al. (2024) Su, H., Yen, H., Xia, M., Shi, W., Muennighoff, N., yu Wang, H., Liu, H., Shi, Q., Siegel, Z. S., Tang, M., Sun, R., Yoon, J., Arik, S. O., Chen, D., and Yu, T. Bright: A realistic and challenging benchmark for reasoning-intensive retrieval, 2024. URL https://arxiv.org/abs/2407.12883.
  • Sun et al. (2024) Sun, L., Han, Y., Zhao, Z., Ma, D., Shen, Z., Chen, B., Chen, L., and Yu, K. Scieval: A multi-level large language model evaluation benchmark for scientific research, 2024. URL https://arxiv.org/abs/2308.13149.
  • Team et al. (2025) Team, K., Du, A., Gao, B., Xing, B., Jiang, C., Chen, C., Li, C., Xiao, C., Du, C., Liao, C., Tang, C., Wang, C., Zhang, D., Yuan, E., Lu, E., Tang, F., Sung, F., Wei, G., Lai, G., Guo, H., Zhu, H., Ding, H., Hu, H., Yang, H., Zhang, H., Yao, H., Zhao, H., Lu, H., Li, H., Yu, H., Gao, H., Zheng, H., Yuan, H., Chen, J., Guo, J., Su, J., Wang, J., Zhao, J., Zhang, J., Liu, J., Yan, J., Wu, J., Shi, L., Ye, L., Yu, L., Dong, M., Zhang, N., Ma, N., Pan, Q., Gong, Q., Liu, S., Ma, S., Wei, S., Cao, S., Huang, S., Jiang, T., Gao, W., Xiong, W., He, W., Huang, W., Wu, W., He, W., Wei, X., Jia, X., Wu, X., Xu, X., Zu, X., Zhou, X., Pan, X., Charles, Y., Li, Y., Hu, Y., Liu, Y., Chen, Y., Wang, Y., Liu, Y., Qin, Y., Liu, Y., Yang, Y., Bao, Y., Du, Y., Wu, Y., Wang, Y., Zhou, Z., Wang, Z., Li, Z., Zhu, Z., Zhang, Z., Wang, Z., Yang, Z., Huang, Z., Huang, Z., Xu, Z., and Yang, Z. Kimi k1.5: Scaling reinforcement learning with llms, 2025. URL https://arxiv.org/abs/2501.12599.
  • Team (2025) Team, N. Sky-t1: Fully open-source reasoning model with o1-preview performance in $450 budget, 2025. URL https://novasky-ai.github.io/posts/sky-t1. Accessed: 2025-01-09.
  • Team (2024) Team, Q. Qwq: Reflect deeply on the boundaries of the unknown, November 2024. URL https://qwenlm.github.io/blog/qwq-32b-preview/.
  • Wang et al. (2024a) Wang, J., Meng, F., Liang, Y., and Zhou, J. Drt-o1: Optimized deep reasoning translation via long chain-of-thought, 2024a. URL https://arxiv.org/abs/2412.17498.
  • Wang et al. (2024b) Wang, P., Li, L., Shao, Z., Xu, R. X., Dai, D., Li, Y., Chen, D., Wu, Y., and Sui, Z. Math-shepherd: Verify and reinforce llms step-by-step without human annotations, 2024b. URL https://arxiv.org/abs/2312.08935.
  • Wang et al. (2021) Wang, S., Liu, Z., Zhong, W., Zhou, M., Wei, Z., Chen, Z., and Duan, N. From lsat: The progress and challenges of complex reasoning, 2021. URL https://arxiv.org/abs/2108.00648.
  • Wang et al. (2024c) Wang, Z., Dong, Y., Delalleau, O., Zeng, J., Shen, G., Egert, D., Zhang, J. J., Sreedhar, M. N., and Kuchaiev, O. Helpsteer2: Open-source dataset for training top-performing reward models, 2024c. URL https://arxiv.org/abs/2406.08673.
  • Wei et al. (2023) Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., and Zhou, D. Chain-of-thought prompting elicits reasoning in large language models, 2023. URL https://arxiv.org/abs/2201.11903.
  • Welleck et al. (2024) Welleck, S., Bertsch, A., Finlayson, M., Schoelkopf, H., Xie, A., Neubig, G., Kulikov, I., and Harchaoui, Z. From decoding to meta-generation: Inference-time algorithms for large language models, 2024. URL https://arxiv.org/abs/2406.16838.
  • Wu et al. (2024a) Wu, T., Lan, J., Yuan, W., Jiao, J., Weston, J., and Sukhbaatar, S. Thinking llms: General instruction following with thought generation, 2024a. URL https://arxiv.org/abs/2410.10630.
  • Wu et al. (2024b) Wu, Y., Sun, Z., Li, S., Welleck, S., and Yang, Y. Inference scaling laws: An empirical analysis of compute-optimal inference for problem-solving with language models, 2024b. URL https://arxiv.org/abs/2408.00724.
  • Xiang et al. (2025) Xiang, V., Snell, C., Gandhi, K., Albalak, A., Singh, A., Blagden, C., Phung, D., Rafailov, R., Lile, N., Mahan, D., Castricato, L., Franken, J.-P., Haber, N., and Finn, C. Towards system 2 reasoning in llms: Learning how to think with meta chain-of-thought, 2025. URL https://arxiv.org/abs/2501.04682.
  • Xie et al. (2023) Xie, Y., Kawaguchi, K., Zhao, Y., Zhao, X., Kan, M.-Y., He, J., and Xie, Q. Self-evaluation guided beam search for reasoning, 2023. URL https://arxiv.org/abs/2305.00633.
  • Xin et al. (2024) Xin, H., Guo, D., Shao, Z., Ren, Z., Zhu, Q., Liu, B., Ruan, C., Li, W., and Liang, X. Deepseek-prover: Advancing theorem proving in llms through large-scale synthetic data, 2024. URL https://arxiv.org/abs/2405.14333.
  • Xu et al. (2025) Xu, H., Wu, X., Wang, W., Li, Z., Zheng, D., Chen, B., Hu, Y., Kang, S., Ji, J., Zhang, Y., Guo, Z., Yang, Y., Zhang, M., and Zhang, D. Redstar: Does scaling long-cot data unlock better slow-reasoning systems?, 2025. URL https://arxiv.org/abs/2501.11284.
  • Yang et al. (2024) Yang, Z., Band, N., Li, S., Candès, E., and Hashimoto, T. Synthetic continued pretraining, 2024. URL https://arxiv.org/abs/2409.07431.
  • Yao et al. (2023a) Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y., and Narasimhan, K. Tree of thoughts: Deliberate problem solving with large language models, 2023a. URL https://arxiv.org/abs/2305.10601.
  • Yao et al. (2023b) Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., and Cao, Y. React: Synergizing reasoning and acting in language models, 2023b. URL https://arxiv.org/abs/2210.03629.
  • Ye et al. (2025a) Ye, Y., Huang, Z., Xiao, Y., Chern, E., Xia, S., and Liu, P. Limo: Less is more for reasoning, 2025a. URL https://arxiv.org/abs/2502.03387.
  • Ye et al. (2025b) Ye, Y., Xiao, Y., Mi, T., and Liu, P. Aime-preview: A rigorous and immediate evaluation framework for advanced mathematical reasoning. https://github.com/GAIR-NLP/AIME-Preview, 2025b. GitHub repository.
  • Yu et al. (2024) Yu, L., Jiang, W., Shi, H., Yu, J., Liu, Z., Zhang, Y., Kwok, J. T., Li, Z., Weller, A., and Liu, W. Metamath: Bootstrap your own mathematical questions for large language models, 2024. URL https://arxiv.org/abs/2309.12284.
  • Yuan et al. (2025) Yuan, S., Chen, Z., Xi, Z., Ye, J., Du, Z., and Chen, J. Agent-r: Training language model agents to reflect via iterative self-training, 2025. URL https://arxiv.org/abs/2501.11425.
  • Zelikman et al. (2022)  泽利克曼等人(2022 年) Zelikman, E., Wu, Y., Mu, J., and Goodman, N. D. Star: Bootstrapping reasoning with reasoning, 2022. URL https://arxiv.org/abs/2203.14465.
    Zelikman, E.,吴宇,穆佳,和古德曼,N. D. 星:借助推理进行推理自举,2022。URL https://arxiv.org/abs/2203.14465。
  • Zelikman et al. (2024)  泽利克曼等人(2024 年) Zelikman, E., Harik, G., Shao, Y., Jayasiri, V., Haber, N., and Goodman, N. D. Quiet-star: Language models can teach themselves to think before speaking, 2024. URL https://arxiv.org/abs/2403.09629.
    Zelikman, E.,Harik, G.,Shao, Y.,Jayasiri, V.,Haber, N.,和 Goodman, N. D. 静默星:语言模型可以教会自己在说话前思考,2024。URL https://arxiv.org/abs/2403.09629。
  • Zhang & Chen (2024)
    张 & 陈 (2024)
    Zhang, H. and Chen, C. Test-time compute scaling laws, 2024. URL https://github.com/hughbzhang/o1_inference_scaling_laws.
    张,H. 和 陈,C. 测试时计算缩放定律,2024。URL https://github.com/hughbzhang/o1_inference_scaling_laws。
  • Zhang et al. (2023)  张等人(2023 年) Zhang, S., Chen, Z., Shen, Y., Ding, M., Tenenbaum, J. B., and Gan, C. Planning with large language models for code generation, 2023. URL https://arxiv.org/abs/2303.05510.
    张,S.,陈,Z.,沈,Y.,丁,M.,Tenenbaum,J. B.,和甘,C. 使用大型语言模型进行代码生成规划,2023。URL https://arxiv.org/abs/2303.05510。
  • Zhang et al. (2024a)  张等人(2024a) Zhang, Y., Wu, S., Yang, Y., Shu, J., Xiao, J., Kong, C., and Sang, J. o1-coder: an o1 replication for coding, 2024a. URL https://arxiv.org/abs/2412.00154.
    张, Y., 吴, S., 杨, Y., 殊, J., 肖杰, J., 孔, C.,和桑, J. o1-coder:一种编码的 o1 复制,2024a。网址 https://arxiv.org/abs/2412.00154。
  • Zhang et al. (2024b)  张等(2024b) Zhang, Y., Yang, J., Yuan, Y., and Yao, A. C.-C. Cumulative reasoning with large language models, 2024b. URL https://arxiv.org/abs/2308.04371.
    张, Y., 杨, J., 元, Y.,姚, A. C.-C. 大型语言模型的累积推理,2024b。URL https://arxiv.org/abs/2308.04371。
  • Zhong et al. (2019)  钟等(2019) Zhong, H., Xiao, C., Tu, C., Zhang, T., Liu, Z., and Sun, M. Jec-qa: A legal-domain question answering dataset, 2019. URL https://arxiv.org/abs/1911.12011.
    Zhong, H., Xiao, C., Tu, C., Zhang, T., Liu, Z. 和 Sun, M. Jec-qa:法律领域问答数据集,2019。URL https://arxiv.org/abs/1911.12011。
  • Zhong et al. (2023)  钟等(2023) Zhong, W., Cui, R., Guo, Y., Liang, Y., Lu, S., Wang, Y., Saied, A., Chen, W., and Duan, N. Agieval: A human-centric benchmark for evaluating foundation models, 2023. URL https://arxiv.org/abs/2304.06364.
    Zhong, W., Cui, R., Guo, Y., Liang, Y., Lu, S., Wang, Y., Saied, A., Chen, W., and Duan, N. Agieval:以人为中心的评估基础模型的基准,2023。URL https://arxiv.org/abs/2304.06364。
  • Zhou et al. (2024)  周等(2024 年) Zhou, A., Yan, K., Shlapentokh-Rothman, M., Wang, H., and Wang, Y.-X. Language agent tree search unifies reasoning acting and planning in language models, 2024. URL https://arxiv.org/abs/2310.04406.
    周,A.,颜,K.,Shlapentokh-Rothman,M.,王,H.,和王,Y.-X. 语言代理树搜索统一语言模型中的推理、行动和规划,2024。URL https://arxiv.org/abs/2310.04406。
  • Zhou et al. (2023)  周等(2023 年) Zhou, C., Liu, P., Xu, P., Iyer, S., Sun, J., Mao, Y., Ma, X., Efrat, A., Yu, P., Yu, L., Zhang, S., Ghosh, G., Lewis, M., Zettlemoyer, L., and Levy, O. Lima: Less is more for alignment, 2023. URL https://arxiv.org/abs/2305.11206.
    周,C.,刘,P.,徐,P.,艾耶尔,S.,孙,J.,毛,Y.,马,X.,艾弗拉特,A.,余,P.,余,L.,张,S.,戈什,G.,刘易斯,M.,泽特莱莫伊,L.,以及利维,O.。Lima:对齐中“少即是多”,2023。URL https://arxiv.org/abs/2305.11206。

Appendix A s1.1  附录 A s1.1

Seven days after our release of s1, we released s1.1. We regenerated traces for our 1,000 samples in s1K using DeepSeek r1 (DeepSeek-AI et al., 2025) to create s1K-1.1. We use the same training procedure to train our model s1.1. Other updates since our launch include the release of o3 (OpenAI, 2025), LIMO (Ye et al., 2025a), and AIME 2025. We consider all these new developments in LABEL:tab:perf2. We find that s1.1 performs significantly better than s1. We also tried distilling from Claude 3.7, which led to worse performance than from r1 (not reported).
七天之后,我们发布了 s1,随后又发布了 s1.1。我们使用 DeepSeek r1(DeepSeek-AI 等,2025)重新生成了 s1K 中的 1,000 个样本的轨迹,创建了 s1K-1.1。我们采用相同的训练程序来训练我们的模型 s1.1。自我们发布以来,其他更新还包括 o3(OpenAI,2025)、LIMO(Ye 等,2025a)和 AIME 2025 的发布。我们在 LABEL:tab:perf2 中考虑了所有这些新进展。我们发现 s1.1 的性能显著优于 s1。我们还尝试从 Claude 3.7 中进行提炼,但性能不如从 r1(未报告)。

Table 5: s1-32B is an open and sample-efficient reasoning model. We evaluate s1-32B, Qwen, and Gemini (some entries are unknown (N.A.), see §4). Other results are from the respective reports (Qwen et al., 2024; Team, 2024; OpenAI, 2024, 2025; DeepSeek-AI et al., 2025; Labs, 2025; Team, 2025) except for AIME 2025 (Ye et al., 2025b). # ex. = number examples used for reasoning finetuning; BF = budget forcing.
表 5:s1-32B 是一个开放且样本高效的推理模型。我们评估了 s1-32B、Qwen 和 Gemini(部分条目未知(N.A.),见§4)。其他结果来自各自的报告(Qwen 等,2024;团队,2024;OpenAI,2024,2025;DeepSeek-AI 等,2025;实验室,2025;团队,2025),除了 AIME 2025(Ye 等,2025b)。# ex. = 用于推理微调的示例数量;BF = 预算强制。
Model # Examples  # 示例 MATH500 GPQA AIME 2024 AIME 2025
API only  API 仅
o3-mini-low  o3-迷你-低 N/A 95.8 70.6 56.3 42.1
o3-mini-medium  o3-迷你-中等 N/A 97.3 76.8 75.8 70.4
o3-mini-high  o3-迷你-高 N/A 97.9 79.7 83.8 80.9
Open Weights  开放重量
QwQ-32B N.A. 90.6 54.5 46.7 32.7
r1  r1:r1 \gg800K 07.3 71.5 79.8 70.0
r1-distill-Llama-70B  r1-蒸馏-羊驼-70B 800K 94.5 65.2 57.1 56.3
r1-distill-Qwen-14B  r1-蒸馏-秋文-14B 800K 93.9 59.1 61.7 48.0
r1-distill-Qwen-32B  r1-蒸馏-秋文-32B 800K 94.3 62.1 58.3 49.6
Open Weights and Open Data
开放式重量和开放式数据
LIMO 817 94.8 66.7 56.3 44.6
s1 w/o BF  s1 无 BF 1K 92.6 56.6 50.0 26.7
s1 with Budget Forcing “Wait” 1x
s1 与预算强制“等待”1x
1K 92.8 59.6 53.3 30.0
s1 with Budget Forcing “Wait” 2x
s1 与预算强制“等待”2 倍
1K 93.0 59.6 53.3 33.3
s1 with Budget Forcing “Wait” 4x
s1 与预算强制“等待”4x
1K 92.2 58.6 56.7 36.7
s1.1 w/o BF  s1.1 无 BF 1K 94.4 60.6 56.7 50.0
s1.1 with Budget Forcing “Wait” 1x
s1.1 预算强制“等待”1x
1K 95.4 62.6 56.7 50.0
s1.1 with Budget Forcing “Wait” 2x
s1.1 预算强制“等待”2 倍
1K 95.4 63.6 56.7 50.0

Appendix B Evaluation determinism
附录 BEvaluation determinism

We run our evaluations using vLLM (Kwon et al., 2023) as it is faster than the alternatives we tried. However, we find that even when using the same random seeds and greedy sampling, evaluation scores can change significantly across runs:
我们使用 vLLM(Kwon 等人,2023)进行评估,因为它比我们尝试过的其他方法更快。然而,我们发现即使在使用相同的随机种子和贪婪采样的情况下,评估分数在运行之间也可能发生显著变化:

  • Different batch sizes causing different results see https://github.com/vllm-project/vllm/issues/5898


    • 不同批次大小导致不同结果,请参阅 https://github.com/vllm-project/vllm/issues/5898
  • Continuing generations causing different results see https://github.com/vllm-project/vllm/issues/11783


    • 持续的代际导致不同的结果,请参阅 https://github.com/vllm-project/vllm/issues/11783
  • Changes in tensor parallelism causing different results


    张量并行变化导致结果不同

As our model generates long reasoning traces prior to its answer, small numeric changes can snowball into large differences. We encounter many generations that are exactly the same for thousands of tokens and then suddenly differ in one token eventually ending up with an entirely different answer. To partly counter this issue we generally run our final evaluations using full precision unless otherwise indicated.
由于我们的模型在给出答案之前会生成长的推理轨迹,小的数值变化可能会累积成大的差异。我们遇到了许多代数,它们在数千个标记上完全相同,然后突然在一个标记上有所不同,最终得到一个完全不同的答案。为了部分解决这个问题,我们通常使用完整精度进行最终评估,除非另有说明。

Appendix C s1K details  附录 C s1K 详情

C.1 s1K summary  C.1 s1K 概述

Table 6: Summary of our dataset s1K. Token count measured by the Qwen-2.5 tokenizer. We prompt Claude to produce keywords given several questions from the domain.
表 6:我们数据集 s1K 的总结。使用 Qwen-2.5 分词器测量的标记计数。我们提示 Claude 根据几个领域问题生成关键词。
Domain  领域 #questions Total token count  总标记数 Keywords  关键词
Geometry  几何学 109 560.2K Area, Triangle, Distance
区域,三角形,距离
Number theory  数论 98 522.5K Sequences, Divisibility  序列,可除性
Combinatorics  组合数学 75 384.7K Permutations, Counting  排列,计数
Real functions  实函数 43 234.8K Trigonometry, Calculus  三角学,微积分
Biology  生物学 41 120.9K Organic reactions  有机反应
Complex functions  复杂函数 32 170.2K Complex roots  复杂数根
Quantum theory  量子理论 32 127.9K Particles, Wave functions
粒子,波函数
Field theory  场论 28 150.1K Polynomials, Roots  多项式,根
Calculus of variations  变分法 28 155.5K Optimization, Control  优化,控制
Difference equations  差分方程 24 132.5K Recurrence, Recursion  复发,递归
Electromagnetic theory  电磁理论 23 95.8K Optics, Waves, Diffraction
光学,波,衍射
Group theory  群论 22 100.0K Groups, Automorphisms  群,自同构
Linear algebra  线性代数 22 128.3K Matrices, Determinants  矩阵,行列式
Probability theory  概率论 20 114.6K Random walk, Expectation
随机游走,期望
Algebraic systems  代数系统 19 109.9K Functional equations  函数方程
Mechanics  力学 19 103.6K Forces, Motion, Energy  力,运动,能量
Thermodynamics  热力学 19 74.2K Heat engines, Entropy  热机,熵
Differential equations  微分方程 18 89.6K Substitution, Existence  替代,存在
Computer science  计算机科学 18 34.2K Complexity theory, Algorithms
复杂性理论,算法
Numerical analysis  数值分析 18 76.5K Error analysis, Stability
错误分析,稳定性
Calculus  微积分 17 96.3K Convergence, Summation  收敛,求和
Algebraic structures  代数结构 17 90.4K Inequalities, Sets  不平等,集合
Astronomy  天文学 16 37.7K Stellar populations, Orbits
恒星群体,轨道
Remaining 27 domains  剩余 27 个域 242 982.2K Domains with \leq 16 questions
域包含 \leq 16 个问题
All domains (51)  所有域(51) 1000 4.7M s1K

C.2 Dataset composition for full 59K questions
C.2 全 59K 个问题的数据集组成

Table 7: Composition of full 59K questions. Thinking and response lengths are measured in tokens using the Qwen2.5-32B-Instruct tokenizer (Qwen et al., 2024). In addition to excluding our evaluation benchmark, AIME24, we also exclude AIME questions from 2022-2023 as we use these 90 questions during our development stage of s1-32B.
表 7:59K 个完整问题的组成。思考和响应长度使用 Qwen2.5-32B-Instruct 分词器(Qwen 等,2024)进行测量。除了排除我们的评估基准 AIME24 之外,我们还排除了 2022-2023 年的 AIME 问题,因为我们在这 90 个问题中使用了这些问题作为 s1-32B 开发阶段的素材。
Source   Description  描述 #Samples Avg. thinking length  平均思考长度
NuminaMATH (LI et al., 2024)
NuminaMATH(李等,2024 年)
Math problems from online websites
在线网站上的数学问题
30660 4.1K
MATH (Hendrycks et al., 2021)
MATH(Hendrycks 等人,2021 年)
Math problems from competitions
竞赛中的数学问题
11999 2.9K
OlympicArena (Huang et al., 2024a)
奥林匹克体育场(黄等,2024a)
Astronomy, Biology, Chemistry, Computer Science, Geography, Math, and Physics olympiad questions
天文学、生物学、化学、计算机科学、地理学、数学和物理学奥林匹克竞赛题目
4250 3.2K
OmniMath (Gao et al., 2024a)
OmniMath(高等人,2024a)
Math problems from competitions
竞赛中的数学问题
4238 4.4K
AGIEval (Zhong et al., 2023; Ling et al., 2017; Hendrycks et al., 2021; Liu et al., 2020; Zhong et al., 2019; Wang et al., 2021)
AGIEval(钟等,2023;凌等,2017;亨德里克斯等,2021;刘等,2020;钟等,2019;王等,2021)
English, Law, Logic and Math problems from the SAT, LSAT and other exams
英语、法律、逻辑和数学问题来自 SAT、LSAT 及其他考试
2385 1.2K
xword Crossword puzzles  纵横字谜 999 0.7K
OlympiadBench (He et al., 2024)
奥林匹亚德基准(He 等,2024)
Math and Physics olympiad questions
数学与物理奥林匹克竞赛题目
896 3.9K
AIME (1983-2021)  AIME(1983-2021) American Invitational Mathematics Examination
美国邀请赛数学考试
890 4.7K
TheoremQA (Chen et al., 2023)
定理 QA(陈等,2023)
Computer Science, Finance, Math, and Physics university-level questions relating to theorems
计算机科学、金融、数学和物理学大学水平有关定理的问题
747 2.1K
USACO (Shi et al., 2024)
USACO(石等,2024)
Code problems from the USA Computing Olympiad
美国计算机奥林匹克竞赛的代码问题
519 3.6K
JEEBench (Arora et al., 2023)
JEEBench(阿罗拉等,2023 年)
Chemistry, Math, and Physics problems used in the university entrance examination of the Indian Institute of Technology
化学、数学和物理学问题,用于印度理工学院入学考试
515 2.9K
GPQA (Rein et al., 2023)
GPQA(Rein 等人,2023 年)
PhD-Level Science Questions
博士级别科学问题
348 2.9K
SciEval (Sun et al., 2024)
SciEval(孙等,2024)
Biology, Chemistry, and Physics problems from various sources
生物学、化学和物理学问题来自多个来源
227 0.7K
s1-prob  s1-prob:s1 概率 Stanford statistics qualifying exams
斯坦福统计学资格考试
182 4.0K
LiveCodeBench (Jain et al., 2024)
LiveCodeBench(贾因等,2024 年)
Code problems from coding websites (LeetCode, AtCoder, and CodeForces)
代码网站(LeetCode、AtCoder 和 CodeForces)的代码问题
151 3.5K
s1-teasers  s1-预告 Math brain-teasers crawled from the Internet
数学智力题来自互联网
23 4.1K
All 59K questions  所有 59K 个问题 Composite of the above datasets with reasoning traces and solutions
上述数据集与推理轨迹及解决方案的复合
59029 3.6K

C.3 s1K grading prompt  C.3 s1K 评分提示

To grade whether an example is correct for our dataset selection in §2, we use the prompt in Figure 8. We grade using Claude 3.5 except for the correctness among the final 1,000 samples, which we graded with Claude 3.7.
为了评估第 2 节中示例是否符合我们的数据集选择,我们使用图 8 中的提示。我们使用 Claude 3.5 进行评分,除了最后 1,000 个样本的正确性,我们使用 Claude 3.7 进行评分。

You are an AI assistant for grading a science problem. The user will provide you with the question itself, an attempt made by a student and the correct answer to the problem. Your job is to judge whether the attempt is correct by comparing it with the correct answer. If the expected solution concludes with a number or choice, there should be no ambiguity. If the expected solution involves going through the entire reasoning process, you should judge the attempt based on whether the reasoning process is correct with correct answer if helpful.
您是用于评分科学问题的 AI 助手。用户会向您提供问题本身、学生尝试的答案以及问题的正确答案。您的任务是通过对正确答案的比较来判断学生的尝试是否正确。如果预期的解决方案以数字或选择结束,则不应存在歧义。如果预期的解决方案涉及整个推理过程,则应根据推理过程是否正确以及是否有帮助的正确答案进行判断。
The user will provide the attempt and the correct answer in the following format:
用户将以以下格式提供尝试和正确答案:
# Problem  # 问题 {problem}  {问题} ## Attempt  尝试 {attempt}  尝试 ## Correct answer  ## 正确答案 {solution}  {解决方案} Explain your reasoning, and end your response on a new line with only ”Yes” or ”No” (without quotes).
解释你的理由,并在新的一行上仅以“是”或“否”(不带引号)结束你的回答。
Figure 8: Grading prompt.
图 8:评分提示

C.4 s1K diversity selection
C.4 s1K 多样性选择

Algorithm 1 Two-stage sampling for s1K
算法 1:s1K 的两阶段采样
1:  Input: 𝒬\mathcal{Q}caligraphic_Q := Set of 24,496 questions with features
输入: 𝒬\mathcal{Q}caligraphic_Q := 包含 24,496 个问题的特征集
2:  Output: 𝒮\mathcal{S}caligraphic_S := Set of 1,000 selected questions
输出: 𝒮\mathcal{S}caligraphic_S := 1,000 个选定问题的集合
3:  𝒮\mathcal{S}\leftarrow\emptysetcaligraphic_S ← ∅ Initialize the output set (only tracks unique elements)
初始化输出集(仅跟踪唯一元素)
4:  for q𝒬q\in\mathcal{Q}italic_q ∈ caligraphic_Q do
4: for q𝒬q\in\mathcal{Q}italic_q ∈ caligraphic_Q do
5:     if IsGeminiCorrect(qqitalic_q) and (IsAIME(qqitalic_q) or IsGPQA(qqitalic_q)) then
5: 如果 IsGeminiCorrect( qqitalic_q ) 且 (IsAIME( qqitalic_q ) 或 IsGPQA( qqitalic_q )) 则
6:        𝒮𝒮{q}\mathcal{S}\leftarrow\mathcal{S}\cup\{q\}caligraphic_S ← caligraphic_S ∪ { italic_q }
7:        Select all correct AIME/GPQA solutions
选择所有正确的 AIME/GPQA 解
8:     else if IsGeminiCorrect(qqitalic_q) and IsMATH(qqitalic_q) and ThinkingLength(qqitalic_q) ¿ 5600 then
否则,如果 IsGeminiCorrect( qqitalic_q )和 IsMATH( qqitalic_q )以及 ThinkingLength( qqitalic_q )大于 5600,则
9:        𝒮𝒮{q}\mathcal{S}\leftarrow\mathcal{S}\cup\{q\}caligraphic_S ← caligraphic_S ∪ { italic_q }
10:        Select correct MATH500 solutions with long chains
选择具有长链的 MATH500 正确解
11:     end if  11: 结束 if
12:  end for  12: for 结束
13:  𝒟\mathcal{D}\leftarrowcaligraphic_D ← All available domains
13: 所有可用域名
14:  Initialize domain pool  初始化域池
15:  while |𝒮|<1000|\mathcal{S}|<1000| caligraphic_S | < 1000 do
15: while ( |𝒮|<10001000|\mathcal{S}|<1000| caligraphic_S | < 1000 ) do
16:     dd\leftarrowitalic_d ← RandomChoice(𝒟\mathcal{D}caligraphic_D)
16: dabsentd\leftarrowitalic_d ← 随机选择( 𝒟\mathcal{D}caligraphic_D )
17:     Randomly select a domain
随机选择一个领域
18:     QdQ_{d}\leftarrowitalic_Q start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ← Questions in domain dditalic_d
18: QdsubscriptabsentQ_{d}\leftarrowitalic_Q start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ← 领域 dditalic_d 中的问题
19:     Get questions from this domain
从该领域获取问题
20:     ranks \leftarrow RankByThinkingLength(QdQ_{d}italic_Q start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT)
20: 排名 \leftarrow 按思考长度排序( QdsubscriptQ_{d}italic_Q start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT )
21:     Rank by thinking length
按思考长度排名
22:     weights 2ranks\leftarrow 2^{-\text{ranks}}← 2 start_POSTSUPERSCRIPT - ranks end_POSTSUPERSCRIPT  22: 权重 2ranksabsentsuperscript2\leftarrow 2^{-\text{ranks}}← 2 start_POSTSUPERSCRIPT - ranks end_POSTSUPERSCRIPT
23:     Apply power-law weighting
应用幂律权重
24:     qq\leftarrowitalic_q ← WeightedSample(QdQ_{d}italic_Q start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, weights)
24: qabsentq\leftarrowitalic_q ← 加权样本( QdsubscriptQ_{d}italic_Q start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , 权重)
25:     Sample favoring longer chains
样本偏好较长的链
26:     𝒮𝒮{q}\mathcal{S}\leftarrow\mathcal{S}\cup\{q\}caligraphic_S ← caligraphic_S ∪ { italic_q }
27:     Add selected question  添加所选问题
28:     QdQd{q}Q_{d}\leftarrow Q_{d}\setminus\{q\}italic_Q start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ← italic_Q start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∖ { italic_q }
29:     if Qd=Q_{d}=\emptysetitalic_Q start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = ∅ then
29: 如果 Qd=subscriptQ_{d}=\emptysetitalic_Q start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = ∅
30:        𝒟𝒟{d}\mathcal{D}\leftarrow\mathcal{D}\setminus\{d\}caligraphic_D ← caligraphic_D ∖ { italic_d }
31:        Remove exhausted domains
移除耗尽域名
32:     end if  32: 结束 if
33:  end while  33: 结束 while 循环

Algorithm 1 provides our algorithm for selecting data in our diversity selection stage. As mentioned in §2, we also include samples from some specific benchmarks we perceive as high-quality. None of the samples overlap with our final evaluation.
算法 1 提供了我们在多样性选择阶段选择数据的方法。如§2 所述,我们还包括了一些我们认为高质量的特定基准的样本。这些样本与我们的最终评估没有重叠。

C.5 Decontamination  C.5 清洁消毒

We filter all samples by checking for an 8-gram overlap between the selected examples and the evaluation benchmarks: MATH500, GPTQA Diamond, and AIME24. We exclude questions with more than an 8-gram overlap.
我们通过检查所选示例与评估基准(MATH500、GPTQA Diamond 和 AIME24)之间的 8-gram 重叠来过滤所有样本:排除超过 8-gram 重叠的问题。

Appendix D Training details
附录 D 培训详情

We take a model that has already been pretrained and instruction tuned and further finetune it for reasoning. Specifically, we use Qwen2.5-32B-Instruct (Qwen et al., 2024), which on math tasks generally matches or outperforms the larger Qwen2.5-72B-Instruct (Qwen et al., 2024) or other open models (Dubey et al., 2024; Groeneveld et al., 2024; Muennighoff et al., 2024). We use token delimiters to separate the thinking stage from the answering stage. We enclose the thinking stage with <|im_start|>think and <|im_start|>answer; both preceded and followed by a newline. Samples from our dataset are in §D.2. We use basic fine-tuning hyperparameters: we train for 5 epochs with a batch size of 16 for a total of 315 gradient steps. We train in bfloat16 precision with a learning rate of 1e51e-51 italic_e - 5 warmed up linearly for 5% (16 steps) and then decayed to 0 over the rest of training (299 steps) following a cosine schedule. We use the AdamW optimizer (Loshchilov & Hutter, 2019) with β1=0.9,β2=0.95\beta_{1}=0.9,\beta_{2}=0.95italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.9 , italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.95 and weight decay of 1e41e-41 italic_e - 4. We do not compute loss on questions, only on reasoning traces and solutions. We ensure the sequence length is large enough to avoid cutting off any samples; a setting we ablate in §D.1. The training takes just 26 minutes on 16 NVIDIA H100 GPUs.
我们采用已经预训练和指令微调的模型,并进一步对其进行推理微调。具体来说,我们使用 Qwen2.5-32B-Instruct(Qwen 等人,2024 年),在数学任务上通常与更大的 Qwen2.5-72B-Instruct(Qwen 等人,2024 年)或其他开源模型(Dubey 等人,2024 年;Groeneveld 等人,2024 年;Muennighoff 等人,2024 年)相当或优于。我们使用标记分隔符来区分思考阶段和回答阶段。我们用 <|im_start|>think<|im_start|>answer 括起思考阶段,前后都带有换行符。我们的数据集样本在§D.2 中。我们使用基本的微调超参数:以 16 个批次的规模训练 5 个 epoch,总共 315 个梯度步骤。我们以 bfloat16 精度训练,学习率 1e5151e-51 italic_e - 5 线性预热 5%(16 步),然后按照余下的训练(299 步)余弦衰减至 0。我们使用 AdamW 优化器(Loshchilov & Hutter,2019 年)以及 β1=0.9,β2=0.95formulae-sequencesubscript10.9subscript20.95\beta_{1}=0.9,\beta_{2}=0.95italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.9 , italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.95 和权重衰减 1e4141e-41 italic_e - 4 。我们只在推理轨迹和解决方案上计算损失,不计算问题损失。 我们确保序列长度足够长,以避免截断任何样本;我们在§D.1 中删除了这一设置。训练在 16 个 NVIDIA H100 GPU 上仅需 26 分钟。

Refer to caption
Figure 9: Training dynamics of s1-32B on s1K.
图 9:s1-32B 在 s1K 上的训练动态

D.1 Training Ablations: Sequence length
D.1 训练消融:序列长度

Table 8: Training sequence length ablation. We report “accuracy / average thinking tokens per sample”; the higher the accuracy and the fewer the thinking tokens (inference cost) the better.
表 8:训练序列长度消融。我们报告“准确率/平均每个样本的思考令牌数”;准确率越高,思考令牌(推理成本)越少,效果越好。
Model A  模型 A Model B  模型 B
Training sequence length
训练序列长度
4096 32768
% training samples cutoff
训练样本截止值
74% 0%
AIME24 30.0% / 20721 50.0% / 6984
MATH500 90.0% / 5324 91.0% / 3268
GPQA 52.5% / 6841 53.0% / 3568

Besides our scaling ablations in §5.2, the main training hyperparameter we ablate is the sequence length used during training. We find that a shorter training sequence length leads to longer reasoning traces at test time. This is because when training with a shorter sequence length the answer section of the training sample is more commonly cut off. Inversely, when the training sequence length is longer, more samples appear in their entirety with the section where the model answers. Thus the model receives more gradient updates where it learns to generate an answer following its chain. This in turn leads to a higher log probability of the answer section at any point during the generation and thus shorter reasoning traces at test time. Performance-wise, we also find that the model trained with a longer sequence length performs better. Thus we opt for the longest training sequence length as it leads to better performance and makes inference more efficient by leading to shorter reasoning traces.
除了第 5.2 节中提到的缩放消融外,我们消融的主要训练超参数是训练过程中使用的序列长度。我们发现较短的训练序列长度会导致测试时的推理轨迹更长。这是因为当使用较短的序列长度进行训练时,训练样本的答案部分更常见地被截断。相反,当训练序列长度较长时,更多样本的答案部分会完整出现。因此,模型在生成答案时接收到的梯度更新更多,从而在生成过程中任何时刻答案部分的日志概率更高,因此在测试时推理轨迹更短。在性能方面,我们还发现使用较长序列长度训练的模型表现更好。因此,我们选择最长的训练序列长度,因为它能带来更好的性能,并通过导致更短的推理轨迹来提高推理效率。

D.2 Training Samples  D.2 训练样本

LABEL:tab:prob, LABEL:tab:teasers, LABEL:tab:arena contain training samples from s1K.
标签:制表符:概率, 标签:制表符:预告, 标签:制表符:竞技场包含来自 s1K 的训练样本。

Table 9: s1K sample stemming from s1-prob. The black text is the question and the yellow text is the solution. Based solely on the question we generate the light blue reasoning trace and blue response with Gemini, which we use for training.
表 9:来自 s1-prob 的 s1K 样本。黑色文字是问题,黄色文字是解决方案。仅基于问题,我们使用 Gemini 生成浅蓝色推理轨迹和蓝色响应,这是我们用于训练的内容。
You are given integrable random variables X,Y0X,Y_{0}italic_X , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and Z0Z_{0}italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT on the same probability space (Ω,,)(\Omega,\mathcal{F},\mathbb{P})( roman_Ω , caligraphic_F , blackboard_P ), and two σ\sigmaitalic_σ-algebras 𝒜,.\mathcal{A}\subset\mathcal{F},\mathcal{B}\subset\mathcal{F}.caligraphic_A ⊂ caligraphic_F , caligraphic_B ⊂ caligraphic_F . For k=1,2,,k=1,2,\ldots,italic_k = 1 , 2 , … ,, let
您在同一概率空间 (Ω,,)(\Omega,\mathcal{F},\mathbb{P})( roman_Ω , caligraphic_F , blackboard_P ) 上被赋予可积随机变量 X,Y0subscript0X,Y_{0}italic_X , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPTZ0subscript0Z_{0}italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ,以及两个 σ\sigmaitalic_σ 代数 𝒜,.formulae-sequence\mathcal{A}\subset\mathcal{F},\mathcal{B}\subset\mathcal{F}.caligraphic_A ⊂ caligraphic_F , caligraphic_B ⊂ caligraphic_F . 。对于 k=1,2,,12k=1,2,\ldots,italic_k = 1 , 2 , … , ,令
Yk:=𝔼[X|σ(𝒜,Z0,,Zk1)],Zk:=𝔼[X|σ(,Y0,,Yk1)].Y_{k}:=\mathbb{E}[X|\sigma(\mathcal{A},Z_{0},\ldots,Z_{k-1})],\quad Z_{k}:=% \mathbb{E}[X|\sigma(\mathcal{B},Y_{0},\ldots,Y_{k-1})].italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) ] , italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := blackboard_E [ italic_X | italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) ] . (a) Show that there exist integrable random variables YY_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT and ZZ_{\infty}italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT such that as nn\to\inftyitalic_n → ∞ both 𝔼[|YnY|]0\mathbb{E}[|Y_{n}-Y_{\infty}|]\to 0blackboard_E [ | italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | ] → 0 and 𝔼[|ZnZ|]0\mathbb{E}[|Z_{n}-Z_{\infty}|]\to 0blackboard_E [ | italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | ] → 0. (b) Prove that almost surely Y=Z.Y_{\infty}=Z_{\infty}.italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT .
(a) 证明存在可积随机变量 YsubscriptY_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPTZsubscriptZ_{\infty}italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ,使得当 nn\to\inftyitalic_n → ∞ 时, 𝔼[|YnY|]0delimited-[]subscriptsubscript0\mathbb{E}[|Y_{n}-Y_{\infty}|]\to 0blackboard_E [ | italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | ] → 0𝔼[|ZnZ|]0delimited-[]subscriptsubscript0\mathbb{E}[|Z_{n}-Z_{\infty}|]\to 0blackboard_E [ | italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | ] → 0 均成立。 (b) 证明几乎必然有 Y=Z.subscriptsubscriptY_{\infty}=Z_{\infty}.italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT .
1. Introduce the notations, n:=σ(𝒜,Z0,,Zn1)\mathcal{F}_{n}:=\sigma(\mathcal{A},Z_{0},\ldots,Z_{n-1})caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT := italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) and 𝒢n:=σ(,Y0,,Yn1)\mathcal{G}_{n}:=\sigma(\mathcal{B},Y_{0},\ldots,Y_{n-1})caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT := italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ), for all n1n\geq 1italic_n ≥ 1 and 0:=𝒜\mathcal{F}_{0}:=\mathcal{A}caligraphic_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT := caligraphic_A, 𝒢0:=\mathcal{G}_{0}:=\mathcal{B}caligraphic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT := caligraphic_B. Clearly, both are filtrations and by definition, Yn=𝔼[X|n]Y_{n}=\mathbb{E}\left[X\rvert\mathcal{F}_{n}\right]italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] and Zn=𝔼[X|𝒢n]Z_{n}=\mathbb{E}\left[X\rvert\mathcal{G}_{n}\right]italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ], for all n0n\geq 0italic_n ≥ 0. Thus {Yn}n0\left\{Y_{n}\right\}_{n\geq 0}{ italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ≥ 0 end_POSTSUBSCRIPT (or {Zn}n0\left\{Z_{n}\right\}_{n\geq 0}{ italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ≥ 0 end_POSTSUBSCRIPT ) is a Doob’s Martingale with respect to filtration {n}n0\left\{\mathcal{F}_{n}\right\}_{n\geq 0}{ caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ≥ 0 end_POSTSUBSCRIPT (or {𝒢n}n0\left\{\mathcal{G}_{n}\right\}_{n\geq 0}{ caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ≥ 0 end_POSTSUBSCRIPT) and hence is U.I., XXitalic_X being integrable. Now apply the fact that a Martingale is U.I. if and only if it converges in L1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and conclude.
介绍所有 n11n\geq 1italic_n ≥ 10:=𝒜assignsubscript0\mathcal{F}_{0}:=\mathcal{A}caligraphic_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT := caligraphic_A 的记号 n:=σ(𝒜,Z0,,Zn1)assignsubscriptsubscript0subscript1\mathcal{F}_{n}:=\sigma(\mathcal{A},Z_{0},\ldots,Z_{n-1})caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT := italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT )𝒢n:=σ(,Y0,,Yn1)assignsubscriptsubscript0subscript1\mathcal{G}_{n}:=\sigma(\mathcal{B},Y_{0},\ldots,Y_{n-1})caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT := italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT )𝒢0:=assignsubscript0\mathcal{G}_{0}:=\mathcal{B}caligraphic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT := caligraphic_B 。显然,这两个都是滤波,并且根据定义,对于所有 n00n\geq 0italic_n ≥ 0Yn=𝔼[X|n]Y_{n}=\mathbb{E}\left[X\rvert\mathcal{F}_{n}\right]italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ]Zn=𝔼[X|𝒢n]Z_{n}=\mathbb{E}\left[X\rvert\mathcal{G}_{n}\right]italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] 。因此, {Yn}n0subscriptsubscript0\left\{Y_{n}\right\}_{n\geq 0}{ italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ≥ 0 end_POSTSUBSCRIPT (或 {Zn}n0subscriptsubscript0\left\{Z_{n}\right\}_{n\geq 0}{ italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ≥ 0 end_POSTSUBSCRIPT )是相对于滤波 {n}n0subscriptsubscript0\left\{\mathcal{F}_{n}\right\}_{n\geq 0}{ caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ≥ 0 end_POSTSUBSCRIPT (或 {𝒢n}n0subscriptsubscript0\left\{\mathcal{G}_{n}\right\}_{n\geq 0}{ caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ≥ 0 end_POSTSUBSCRIPT )的 Doob 鞅,因此是 U.I., XXitalic_X 是可积的。现在应用一个鞅是 U.I.当且仅当它在 L1superscript1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 中收敛的事实,并得出结论。
2. We provide two different proofs.
我们提供了两种不同的证明。
1. Observe the following identity which follows from Tower Property. 𝔼[Yn|𝒢n]=𝔼[𝔼[X|n]|𝒢n]=𝔼[𝔼[X|𝒢n]|n]=𝔼[Zn|n].{}\mathbb{E}\left[Y_{n}\rvert\mathcal{G}_{n}\right]=\mathbb{E}\left[\mathbb{E}% \left[X\rvert\mathcal{F}_{n}\right]\rvert\mathcal{G}_{n}\right]=\mathbb{E}% \left[\mathbb{E}\left[X\rvert\mathcal{G}_{n}\right]\rvert\mathcal{F}_{n}\right% ]=\mathbb{E}\left[Z_{n}\rvert\mathcal{F}_{n}\right].blackboard_E [ italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] = blackboard_E [ blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] | caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] = blackboard_E [ blackboard_E [ italic_X | caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] = blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] . (4) Note that 𝔼|𝔼[Yn|𝒢n]𝔼[Y|𝒢n]|𝔼|YnY|=o(1)\mathbb{E}\rvert\mathbb{E}\left[Y_{n}\rvert\mathcal{G}_{n}\right]-\mathbb{E}% \left[Y_{\infty}\rvert\mathcal{G}_{n}\right]\rvert\leq\mathbb{E}|Y_{n}-Y_{% \infty}|=o(1)blackboard_E | blackboard_E [ italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] - blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] | ≤ blackboard_E | italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | = italic_o ( 1 ) and by Levy’s Upward Theorem, 𝔼[Y|𝒢n]L1𝔼[Y|𝒢]\mathbb{E}\left[Y_{\infty}\rvert\mathcal{G}_{n}\right]\stackrel{{\scriptstyle L% ^{1}}}{{\longrightarrow}}\mathbb{E}\left[Y_{\infty}\rvert\mathcal{G}_{\infty}\right]blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] start_RELOP SUPERSCRIPTOP start_ARG ⟶ end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_ARG end_RELOP blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ], where 𝒢=σ(n0𝒢n)=σ(,Y0,Y1).\mathcal{G}_{\infty}=\sigma\left(\cup_{n\geq 0}\mathcal{G}_{n}\right)=\sigma(% \mathcal{B},Y_{0},Y_{1}\ldots).caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( ∪ start_POSTSUBSCRIPT italic_n ≥ 0 end_POSTSUBSCRIPT caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … ) . It is evident that Ym𝒢Y_{\infty}\in m\mathcal{G}_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∈ italic_m caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT (since it is almost sure limit of Ynm𝒢nY_{n}\in m\mathcal{G}_{n}italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ italic_m caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT) and hence 𝔼[Y|𝒢]=Y\mathbb{E}\left[Y_{\infty}\rvert\mathcal{G}_{\infty}\right]=Y_{\infty}blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT almost surely. Combining the above observations we conclude that 𝔼[Yn|𝒢n]L1Y.\mathbb{E}\left[Y_{n}\rvert\mathcal{G}_{n}\right]\stackrel{{\scriptstyle L^{1}% }}{{\longrightarrow}}Y_{\infty}.blackboard_E [ italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] start_RELOP SUPERSCRIPTOP start_ARG ⟶ end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_ARG end_RELOP italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT . Similarly, 𝔼[Zn|n]L1Z.\mathbb{E}\left[Z_{n}\rvert\mathcal{F}_{n}\right]\stackrel{{\scriptstyle L^{1}% }}{{\longrightarrow}}Z_{\infty}.blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] start_RELOP SUPERSCRIPTOP start_ARG ⟶ end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_ARG end_RELOP italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT . Now invoke Equation (LABEL:tower), to conclude that Y=ZY_{\infty}=Z_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT almost surely.
请注意, 𝔼|𝔼[Yn|𝒢n]𝔼[Y|𝒢n]|𝔼|YnY|=o(1)\mathbb{E}\rvert\mathbb{E}\left[Y_{n}\rvert\mathcal{G}_{n}\right]-\mathbb{E}% \left[Y_{\infty}\rvert\mathcal{G}_{n}\right]\rvert\leq\mathbb{E}|Y_{n}-Y_{% \infty}|=o(1)blackboard_E | blackboard_E [ italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] - blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] | ≤ blackboard_E | italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | = italic_o ( 1 ) 和根据 Levy 的上向定理 𝔼[Y|𝒢n]L1𝔼[Y|𝒢]\mathbb{E}\left[Y_{\infty}\rvert\mathcal{G}_{n}\right]\stackrel{{\scriptstyle L% ^{1}}}{{\longrightarrow}}\mathbb{E}\left[Y_{\infty}\rvert\mathcal{G}_{\infty}\right]blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] start_RELOP SUPERSCRIPTOP start_ARG ⟶ end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_ARG end_RELOP blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] ,其中 𝒢=σ(n0𝒢n)=σ(,Y0,Y1).subscriptsubscript0subscriptsubscript0subscript1\mathcal{G}_{\infty}=\sigma\left(\cup_{n\geq 0}\mathcal{G}_{n}\right)=\sigma(% \mathcal{B},Y_{0},Y_{1}\ldots).caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( ∪ start_POSTSUBSCRIPT italic_n ≥ 0 end_POSTSUBSCRIPT caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … ) . 显然, Ym𝒢subscriptsubscriptY_{\infty}\in m\mathcal{G}_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∈ italic_m caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT (因为它几乎是 Ynm𝒢nsubscriptsubscriptY_{n}\in m\mathcal{G}_{n}italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ italic_m caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT 的极限)因此 𝔼[Y|𝒢]=Y\mathbb{E}\left[Y_{\infty}\rvert\mathcal{G}_{\infty}\right]=Y_{\infty}blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 几乎必然。结合上述观察,我们得出结论 𝔼[Yn|𝒢n]L1Y.\mathbb{E}\left[Y_{n}\rvert\mathcal{G}_{n}\right]\stackrel{{\scriptstyle L^{1}% }}{{\longrightarrow}}Y_{\infty}.blackboard_E [ italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] start_RELOP SUPERSCRIPTOP start_ARG ⟶ end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_ARG end_RELOP italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT . 同样地, 𝔼[Zn|n]L1Z.\mathbb{E}\left[Z_{n}\rvert\mathcal{F}_{n}\right]\stackrel{{\scriptstyle L^{1}% }}{{\longrightarrow}}Z_{\infty}.blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] start_RELOP SUPERSCRIPTOP start_ARG ⟶ end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_ARG end_RELOP italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT . 现在调用方程(LABEL:tower),以得出结论 Y=ZsubscriptsubscriptY_{\infty}=Z_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 几乎必然。

观察以下由塔性质得出的恒等式。
2. Since XXitalic_X is integrable, we have by Levy’s Upward Theorem, Y=limnYn=limn𝔼[X|n]=𝔼[X|].Y_{\infty}=\lim_{n\to\infty}Y_{n}=\lim_{n\to\infty}\mathbb{E}\left[X\rvert% \mathcal{F}_{n}\right]=\mathbb{E}\left[X\rvert\mathcal{F}_{\infty}\right].italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] = blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] . Similarly, Z=𝔼[X|𝒢].Z_{\infty}=\mathbb{E}\left[X\rvert\mathcal{G}_{\infty}\right].italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] . On the otherhand, in first proof we have observed that Ym𝒢Y_{\infty}\in m\mathcal{G}_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∈ italic_m caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT and hence Ym𝒢m=m(𝒢)Y_{\infty}\in m\mathcal{G}_{\infty}\cap m\mathcal{F}_{\infty}=m\left(\mathcal{% G}_{\infty}\cap\mathcal{F}_{\infty}\right)italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∈ italic_m caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∩ italic_m caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_m ( caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∩ caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ). This yields the following. Y_∞ = E [Y_∞ |G_∞ ∩F_∞ ] = E [ E [X |F_∞ ]|G_∞ ∩F_∞ ] = E [X|G_∞ ∩F_∞ ]. Similarly, Z=𝔼[X|𝒢].Z_{\infty}=\mathbb{E}\left[X\rvert\mathcal{G}_{\infty}\cap\mathcal{F}_{\infty}% \right].italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∩ caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] . This proves that Y=ZY_{\infty}=Z_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT almost surely.
由于 XXitalic_X 可积,根据 Levy 上推定理,我们有 Y=limnYn=limn𝔼[X|n]=𝔼[X|].Y_{\infty}=\lim_{n\to\infty}Y_{n}=\lim_{n\to\infty}\mathbb{E}\left[X\rvert% \mathcal{F}_{n}\right]=\mathbb{E}\left[X\rvert\mathcal{F}_{\infty}\right].italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] = blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] . 同样地, Z=𝔼[X|𝒢].Z_{\infty}=\mathbb{E}\left[X\rvert\mathcal{G}_{\infty}\right].italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] . 另一方面,在第一个证明中,我们观察到 Ym𝒢subscriptsubscriptY_{\infty}\in m\mathcal{G}_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∈ italic_m caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 因此 Ym𝒢m=m(𝒢)subscriptsubscriptsubscriptsubscriptsubscriptY_{\infty}\in m\mathcal{G}_{\infty}\cap m\mathcal{F}_{\infty}=m\left(\mathcal{% G}_{\infty}\cap\mathcal{F}_{\infty}\right)italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∈ italic_m caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∩ italic_m caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_m ( caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∩ caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) 这导致以下结果。Y_∞ = E [Y_∞ |G_∞ ∩F_∞ ] = E [ E [X |F_∞ ]|G_∞ ∩F_∞ ] = E [X|G_∞ ∩F_∞ ].同样地, Z=𝔼[X|𝒢].Z_{\infty}=\mathbb{E}\left[X\rvert\mathcal{G}_{\infty}\cap\mathcal{F}_{\infty}% \right].italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∩ caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] . 这证明了 Y=ZsubscriptsubscriptY_{\infty}=Z_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 几乎必然。
Let’s analyze the problem. We are given two sequences of random variables YkY_{k}italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and ZkZ_{k}italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT defined by conditional expectations. We need to show that these sequences converge in L1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and that their limits are equal almost surely.
让我们分析这个问题。我们有两个由条件期望定义的随机变量序列 YksubscriptY_{k}italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPTZksubscriptZ_{k}italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT 。我们需要证明这两个序列在 L1superscript1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 中收敛,并且它们的极限几乎处处相等。
Part (a): Convergence in L1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT.
部分(a): L1superscript1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 中的收敛
The sigma-algebras involved are increasing:
涉及到的 sigma 代数是递增的:
k=σ(𝒜,Z0,,Zk1)\mathcal{F}_{k}=\sigma(\mathcal{A},Z_{0},\ldots,Z_{k-1})caligraphic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) 𝒢k=σ(,Y0,,Yk1)\mathcal{G}_{k}=\sigma(\mathcal{B},Y_{0},\ldots,Y_{k-1})caligraphic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) For the YkY_{k}italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT sequence:  对于 YksubscriptY_{k}italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT 序列: Y1=𝔼[X|σ(𝒜)]Y_{1}=\mathbb{E}[X|\sigma(\mathcal{A})]italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_A ) ] Y2=𝔼[X|σ(𝒜,Z0)]Y_{2}=\mathbb{E}[X|\sigma(\mathcal{A},Z_{0})]italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ] Y3=𝔼[X|σ(𝒜,Z0,Z1)]Y_{3}=\mathbb{E}[X|\sigma(\mathcal{A},Z_{0},Z_{1})]italic_Y start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] The sequence of sigma-algebras σ(𝒜,Z0,,Zk1)\sigma(\mathcal{A},Z_{0},\ldots,Z_{k-1})italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) is increasing. By the Martingale Convergence Theorem (for conditional expectations), since XXitalic_X is integrable, the sequence YkY_{k}italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT converges almost surely and in L1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT to 𝔼[X|σ(𝒜,Z0,Z1,)]\mathbb{E}[X|\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)]blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ]. Let Y=𝔼[X|σ(𝒜,Z0,Z1,)]Y_{\infty}=\mathbb{E}[X|\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ].
sigma-代数序列 σ(𝒜,Z0,,Zk1)subscript0subscript1\sigma(\mathcal{A},Z_{0},\ldots,Z_{k-1})italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) 是递增的。根据鞅收敛定理(对于条件期望),由于 XXitalic_X 是可积的,序列 YksubscriptY_{k}italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT 几乎处处收敛,并在 L1superscript1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 中收敛到 𝔼[X|σ(𝒜,Z0,Z1,)]delimited-[]conditionalsubscript0subscript1\mathbb{E}[X|\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)]blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ] 。令 Y=𝔼[X|σ(𝒜,Z0,Z1,)]subscriptdelimited-[]conditionalsubscript0subscript1Y_{\infty}=\mathbb{E}[X|\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ]
Similarly, for the ZkZ_{k}italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT sequence:
同样,对于 ZksubscriptZ_{k}italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT 序列:
Z1=𝔼[X|σ()]Z_{1}=\mathbb{E}[X|\sigma(\mathcal{B})]italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_B ) ] Z2=𝔼[X|σ(,Y0)]Z_{2}=\mathbb{E}[X|\sigma(\mathcal{B},Y_{0})]italic_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ] Z3=𝔼[X|σ(,Y0,Y1)]Z_{3}=\mathbb{E}[X|\sigma(\mathcal{B},Y_{0},Y_{1})]italic_Z start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] The sequence of sigma-algebras σ(,Y0,,Yk1)\sigma(\mathcal{B},Y_{0},\ldots,Y_{k-1})italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) is increasing. By the Martingale Convergence Theorem, the sequence ZkZ_{k}italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT converges almost surely and in L1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT to 𝔼[X|σ(,Y0,Y1,)]\mathbb{E}[X|\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)]blackboard_E [ italic_X | italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ]. Let Z=𝔼[X|σ(,Y0,Y1,)]Z_{\infty}=\mathbb{E}[X|\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)]italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ].
sigma-代数序列 σ(,Y0,,Yk1)subscript0subscript1\sigma(\mathcal{B},Y_{0},\ldots,Y_{k-1})italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) 是递增的。根据鞅收敛定理,序列 ZksubscriptZ_{k}italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT 几乎必然且在 L1superscript1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 中收敛到 𝔼[X|σ(,Y0,Y1,)]delimited-[]conditionalsubscript0subscript1\mathbb{E}[X|\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)]blackboard_E [ italic_X | italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ] 。令 Z=𝔼[X|σ(,Y0,Y1,)]subscriptdelimited-[]conditionalsubscript0subscript1Z_{\infty}=\mathbb{E}[X|\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)]italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ]
Part (b): Prove that almost surely Y=ZY_{\infty}=Z_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT.
部分(b):证明几乎处处 Y=ZsubscriptsubscriptY_{\infty}=Z_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT
We have Y=𝔼[X|σ(𝒜,Z0,Z1,)]Y_{\infty}=\mathbb{E}[X|\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ] and Z=𝔼[X|σ(,Y0,Y1,)]Z_{\infty}=\mathbb{E}[X|\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)]italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ].
我们有 Y=𝔼[X|σ(𝒜,Z0,Z1,)]subscriptdelimited-[]conditionalsubscript0subscript1Y_{\infty}=\mathbb{E}[X|\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ]Z=𝔼[X|σ(,Y0,Y1,)]subscriptdelimited-[]conditionalsubscript0subscript1Z_{\infty}=\mathbb{E}[X|\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)]italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ]
Let 𝒴=σ(𝒜,Z0,Z1,)\mathcal{Y}_{\infty}=\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) and 𝒵=σ(,Y0,Y1,)\mathcal{Z}_{\infty}=\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ).
𝒴=σ(𝒜,Z0,Z1,)subscriptsubscript0subscript1\mathcal{Y}_{\infty}=\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … )𝒵=σ(,Y0,Y1,)subscriptsubscript0subscript1\mathcal{Z}_{\infty}=\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … )
Then Y=𝔼[X|𝒴]Y_{\infty}=\mathbb{E}[X|\mathcal{Y}_{\infty}]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] and Z=𝔼[X|𝒵]Z_{\infty}=\mathbb{E}[X|\mathcal{Z}_{\infty}]italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ].
然后 Y=𝔼[X|𝒴]subscriptdelimited-[]conditionalsubscriptY_{\infty}=\mathbb{E}[X|\mathcal{Y}_{\infty}]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ]Z=𝔼[X|𝒵]subscriptdelimited-[]conditionalsubscriptZ_{\infty}=\mathbb{E}[X|\mathcal{Z}_{\infty}]italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ]
Consider Y1=𝔼[X|𝒜]Y_{1}=\mathbb{E}[X|\mathcal{A}]italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_A ]. Z1=𝔼[X|]Z_{1}=\mathbb{E}[X|\mathcal{B}]italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_B ].
考虑 Y1=𝔼[X|𝒜]subscript1delimited-[]conditionalY_{1}=\mathbb{E}[X|\mathcal{A}]italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_A ] . Z1=𝔼[X|]subscript1delimited-[]conditionalZ_{1}=\mathbb{E}[X|\mathcal{B}]italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_B ] .
Y2=𝔼[X|σ(𝒜,Z0)]Y_{2}=\mathbb{E}[X|\sigma(\mathcal{A},Z_{0})]italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ]. Since Z0=𝔼[X|]Z_{0}=\mathbb{E}[X|\mathcal{B}]italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_B ], σ(𝒜,Z0)=σ(𝒜,𝔼[X|])\sigma(\mathcal{A},Z_{0})=\sigma(\mathcal{A},\mathbb{E}[X|\mathcal{B}])italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = italic_σ ( caligraphic_A , blackboard_E [ italic_X | caligraphic_B ] ).
Z0=𝔼[X|]subscript0delimited-[]conditionalZ_{0}=\mathbb{E}[X|\mathcal{B}]italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_B ] 以来, σ(𝒜,Z0)=σ(𝒜,𝔼[X|])subscript0delimited-[]conditional\sigma(\mathcal{A},Z_{0})=\sigma(\mathcal{A},\mathbb{E}[X|\mathcal{B}])italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = italic_σ ( caligraphic_A , blackboard_E [ italic_X | caligraphic_B ] )
Z2=𝔼[X|σ(,Y0)]Z_{2}=\mathbb{E}[X|\sigma(\mathcal{B},Y_{0})]italic_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ]. Since Y0Y_{0}italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is not defined, let’s assume Y0Y_{0}italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and Z0Z_{0}italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT are given.
由于 Y0subscript0Y_{0}italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT 未定义,我们假设 Y0subscript0Y_{0}italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPTZ0subscript0Z_{0}italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT 已给出。
Let’s consider the sigma-algebra generated by all the YiY_{i}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and ZiZ_{i}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.
让我们考虑由所有 YisubscriptY_{i}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPTZisubscriptZ_{i}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT 生成的 σ-代数。
Let 𝒢=σ(Y0,Z0,Y1,Z1,)\mathcal{G}=\sigma(Y_{0},Z_{0},Y_{1},Z_{1},\ldots)caligraphic_G = italic_σ ( italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ). Both YkY_{k}italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and ZkZ_{k}italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT are measurable with respect to 𝒢\mathcal{G}caligraphic_G. Thus, YY_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT and ZZ_{\infty}italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT are measurable with respect to 𝒢\mathcal{G}caligraphic_G.
两者均可用 𝒢\mathcal{G}caligraphic_G 来衡量。因此, YsubscriptY_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPTZsubscriptZ_{\infty}italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 也可用 𝒢\mathcal{G}caligraphic_G 来衡量。
Consider 𝔼[Yn+1|𝒜,Z0,,Zn1]=𝔼[𝔼[X|σ(𝒜,Z0,,Zn)]|𝒜,Z0,,Zn1]=Yn\mathbb{E}[Y_{n+1}|\mathcal{A},Z_{0},\ldots,Z_{n-1}]=\mathbb{E}[\mathbb{E}[X|% \sigma(\mathcal{A},Z_{0},\ldots,Z_{n})]|\mathcal{A},Z_{0},\ldots,Z_{n-1}]=Y_{n}blackboard_E [ italic_Y start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT | caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ] = blackboard_E [ blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ] | caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.  考虑 𝔼[Yn+1|𝒜,Z0,,Zn1]=𝔼[𝔼[X|σ(𝒜,Z0,,Zn)]|𝒜,Z0,,Zn1]=Yndelimited-[]conditionalsubscript1subscript0subscript1delimited-[]conditionaldelimited-[]conditionalsubscript0subscriptsubscript0subscript1subscript\mathbb{E}[Y_{n+1}|\mathcal{A},Z_{0},\ldots,Z_{n-1}]=\mathbb{E}[\mathbb{E}[X|% \sigma(\mathcal{A},Z_{0},\ldots,Z_{n})]|\mathcal{A},Z_{0},\ldots,Z_{n-1}]=Y_{n}blackboard_E [ italic_Y start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT | caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ] = blackboard_E [ blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ] | caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT Consider 𝔼[Zn+1|,Y0,,Yn1]=𝔼[𝔼[X|σ(,Y0,,Yn)]|,Y0,,Yn1]=Zn\mathbb{E}[Z_{n+1}|\mathcal{B},Y_{0},\ldots,Y_{n-1}]=\mathbb{E}[\mathbb{E}[X|% \sigma(\mathcal{B},Y_{0},\ldots,Y_{n})]|\mathcal{B},Y_{0},\ldots,Y_{n-1}]=Z_{n}blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT | caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ] = blackboard_E [ blackboard_E [ italic_X | italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ] | caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ] = italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.  考虑 𝔼[Zn+1|,Y0,,Yn1]=𝔼[𝔼[X|σ(,Y0,,Yn)]|,Y0,,Yn1]=Zndelimited-[]conditionalsubscript1subscript0subscript1delimited-[]conditionaldelimited-[]conditionalsubscript0subscriptsubscript0subscript1subscript\mathbb{E}[Z_{n+1}|\mathcal{B},Y_{0},\ldots,Y_{n-1}]=\mathbb{E}[\mathbb{E}[X|% \sigma(\mathcal{B},Y_{0},\ldots,Y_{n})]|\mathcal{B},Y_{0},\ldots,Y_{n-1}]=Z_{n}blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT | caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ] = blackboard_E [ blackboard_E [ italic_X | italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ] | caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ] = italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT Let =σ(𝒜σ(Y0,Z0,Y1,Z1,))\mathcal{H}=\sigma(\mathcal{A}\cup\mathcal{B}\cup\sigma(Y_{0},Z_{0},Y_{1},Z_{1% },\ldots))caligraphic_H = italic_σ ( caligraphic_A ∪ caligraphic_B ∪ italic_σ ( italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ). Consider 𝔼[X|𝒜,Z0,,Zk1]=Yk\mathbb{E}[X|\mathcal{A},Z_{0},\ldots,Z_{k-1}]=Y_{k}blackboard_E [ italic_X | caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.  考虑 𝔼[X|𝒜,Z0,,Zk1]=Ykdelimited-[]conditionalsubscript0subscript1subscript\mathbb{E}[X|\mathcal{A},Z_{0},\ldots,Z_{k-1}]=Y_{k}blackboard_E [ italic_X | caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT Consider 𝔼[X|,Y0,,Yk1]=Zk\mathbb{E}[X|\mathcal{B},Y_{0},\ldots,Y_{k-1}]=Z_{k}blackboard_E [ italic_X | caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ] = italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.  考虑 𝔼[X|,Y0,,Yk1]=Zkdelimited-[]conditionalsubscript0subscript1subscript\mathbb{E}[X|\mathcal{B},Y_{0},\ldots,Y_{k-1}]=Z_{k}blackboard_E [ italic_X | caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ] = italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT Let’s consider the limit sigma-algebras 𝒴=σ(𝒜,Z0,Z1,)\mathcal{Y}_{\infty}=\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) and 𝒵=σ(,Y0,Y1,)\mathcal{Z}_{\infty}=\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ).
让我们考虑极限σ代数 𝒴=σ(𝒜,Z0,Z1,)subscriptsubscript0subscript1\mathcal{Y}_{\infty}=\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … )𝒵=σ(,Y0,Y1,)subscriptsubscript0subscript1\mathcal{Z}_{\infty}=\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … )
We know that YY_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT is 𝒴\mathcal{Y}_{\infty}caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-measurable and ZZ_{\infty}italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT is 𝒵\mathcal{Z}_{\infty}caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-measurable.
我们知道 YsubscriptY_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT𝒴subscript\mathcal{Y}_{\infty}caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 可测的, ZsubscriptZ_{\infty}italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT𝒵subscript\mathcal{Z}_{\infty}caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 可测的。
Let 𝒞=σ(𝒜)\mathcal{C}=\sigma(\mathcal{A}\cup\mathcal{B})caligraphic_C = italic_σ ( caligraphic_A ∪ caligraphic_B ). Y1=𝔼[X|𝒜]Y_{1}=\mathbb{E}[X|\mathcal{A}]italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_A ]. Z1=𝔼[X|]Z_{1}=\mathbb{E}[X|\mathcal{B}]italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_B ]. Y2=𝔼[X|σ(𝒜,Z0)]=𝔼[X|σ(𝒜,𝔼[X|])]Y_{2}=\mathbb{E}[X|\sigma(\mathcal{A},Z_{0})]=\mathbb{E}[X|\sigma(\mathcal{A},% \mathbb{E}[X|\mathcal{B}])]italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ] = blackboard_E [ italic_X | italic_σ ( caligraphic_A , blackboard_E [ italic_X | caligraphic_B ] ) ]. Z2=𝔼[X|σ(,Y0)]=𝔼[X|σ(,𝔼[X|𝒜])]Z_{2}=\mathbb{E}[X|\sigma(\mathcal{B},Y_{0})]=\mathbb{E}[X|\sigma(\mathcal{B},% \mathbb{E}[X|\mathcal{A}])]italic_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ] = blackboard_E [ italic_X | italic_σ ( caligraphic_B , blackboard_E [ italic_X | caligraphic_A ] ) ]. Consider the case when 𝒜\mathcal{A}caligraphic_A and \mathcal{B}caligraphic_B are independent. Then Y1=𝔼[X|𝒜]Y_{1}=\mathbb{E}[X|\mathcal{A}]italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_A ] and Z1=𝔼[X|]Z_{1}=\mathbb{E}[X|\mathcal{B}]italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_B ].
考虑当 𝒜\mathcal{A}caligraphic_A\mathcal{B}caligraphic_B 相互独立的情况。然后 Y1=𝔼[X|𝒜]subscript1delimited-[]conditionalY_{1}=\mathbb{E}[X|\mathcal{A}]italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_A ]Z1=𝔼[X|]subscript1delimited-[]conditionalZ_{1}=\mathbb{E}[X|\mathcal{B}]italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_B ]
Y2=𝔼[X|σ(𝒜,𝔼[X|])]Y_{2}=\mathbb{E}[X|\sigma(\mathcal{A},\mathbb{E}[X|\mathcal{B}])]italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_A , blackboard_E [ italic_X | caligraphic_B ] ) ]. Let’s consider the sigma-algebra generated by all the information. Let =σ(𝒜,,Y0,Z0,Y1,Z1,)\mathcal{F}_{\infty}=\sigma(\mathcal{A},\mathcal{B},Y_{0},Z_{0},Y_{1},Z_{1},\ldots)caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ).
让我们考虑由所有信息生成的 sigma 代数。令 =σ(𝒜,,Y0,Z0,Y1,Z1,)subscriptsubscript0subscript0subscript1subscript1\mathcal{F}_{\infty}=\sigma(\mathcal{A},\mathcal{B},Y_{0},Z_{0},Y_{1},Z_{1},\ldots)caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … )
Then both YY_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT and ZZ_{\infty}italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT are \mathcal{F}_{\infty}caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-measurable.
然后, YsubscriptY_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPTZsubscriptZ_{\infty}italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 都是 subscript\mathcal{F}_{\infty}caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT -可测量的。
We have Y=limnYnY_{\infty}=\lim_{n\to\infty}Y_{n}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and Z=limnZnZ_{\infty}=\lim_{n\to\infty}Z_{n}italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT a.s. and in L1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT.
我们拥有 Y=limnYnsubscriptsubscriptsubscriptY_{\infty}=\lim_{n\to\infty}Y_{n}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPTZ=limnZnsubscriptsubscriptsubscriptZ_{\infty}=\lim_{n\to\infty}Z_{n}italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT a.s.以及 L1superscript1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT
Consider 𝔼[X|𝒜,Z0,Z1,]=Y\mathbb{E}[X|\mathcal{A},Z_{0},Z_{1},\ldots]=Y_{\infty}blackboard_E [ italic_X | caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ] = italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT and 𝔼[X|,Y0,Y1,]=Z\mathbb{E}[X|\mathcal{B},Y_{0},Y_{1},\ldots]=Z_{\infty}blackboard_E [ italic_X | caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ] = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT.
考虑 𝔼[X|𝒜,Z0,Z1,]=Ydelimited-[]conditionalsubscript0subscript1subscript\mathbb{E}[X|\mathcal{A},Z_{0},Z_{1},\ldots]=Y_{\infty}blackboard_E [ italic_X | caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ] = italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT𝔼[X|,Y0,Y1,]=Zdelimited-[]conditionalsubscript0subscript1subscript\mathbb{E}[X|\mathcal{B},Y_{0},Y_{1},\ldots]=Z_{\infty}blackboard_E [ italic_X | caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ] = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT
Let =σ(𝒜)\mathcal{H}_{\infty}=\sigma(\mathcal{A}\cup\mathcal{B})caligraphic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_A ∪ caligraphic_B ). Consider 𝔼[X|]\mathbb{E}[X|\mathcal{H}_{\infty}]blackboard_E [ italic_X | caligraphic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ].  考虑 𝔼[X|]delimited-[]conditionalsubscript\mathbb{E}[X|\mathcal{H}_{\infty}]blackboard_E [ italic_X | caligraphic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] Let’s try to show that both YY_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT and ZZ_{\infty}italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT are equal to 𝔼[X|σ(𝒜)]\mathbb{E}[X|\sigma(\mathcal{A}\vee\mathcal{B})]blackboard_E [ italic_X | italic_σ ( caligraphic_A ∨ caligraphic_B ) ]. This seems too simple.
让我们尝试证明 YsubscriptY_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPTZsubscriptZ_{\infty}italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 都等于 𝔼[X|σ(𝒜)]delimited-[]conditional\mathbb{E}[X|\sigma(\mathcal{A}\vee\mathcal{B})]blackboard_E [ italic_X | italic_σ ( caligraphic_A ∨ caligraphic_B ) ] 。这似乎太简单了。
Let’s consider the definition of conditional expectation. For any set A𝒴A\in\mathcal{Y}_{\infty}italic_A ∈ caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT, 𝔼[Y1A]=𝔼[X1A]\mathbb{E}[Y_{\infty}1_{A}]=\mathbb{E}[X1_{A}]blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 1 start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ] = blackboard_E [ italic_X 1 start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ].
让我们考虑条件期望的定义。对于任意集合 A𝒴subscriptA\in\mathcal{Y}_{\infty}italic_A ∈ caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT𝔼[Y1A]=𝔼[X1A]delimited-[]subscriptsubscript1delimited-[]subscript1\mathbb{E}[Y_{\infty}1_{A}]=\mathbb{E}[X1_{A}]blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 1 start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ] = blackboard_E [ italic_X 1 start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ]
For any set B𝒵B\in\mathcal{Z}_{\infty}italic_B ∈ caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT, 𝔼[Z1B]=𝔼[X1B]\mathbb{E}[Z_{\infty}1_{B}]=\mathbb{E}[X1_{B}]blackboard_E [ italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 1 start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ] = blackboard_E [ italic_X 1 start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ].
对于任何集合 B𝒵subscriptB\in\mathcal{Z}_{\infty}italic_B ∈ caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT𝔼[Z1B]=𝔼[X1B]delimited-[]subscriptsubscript1delimited-[]subscript1\mathbb{E}[Z_{\infty}1_{B}]=\mathbb{E}[X1_{B}]blackboard_E [ italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 1 start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ] = blackboard_E [ italic_X 1 start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ]
Consider 𝔼[Yk+1|k]=Yk\mathbb{E}[Y_{k+1}|\mathcal{F}_{k}]=Y_{k}blackboard_E [ italic_Y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT where k=σ(𝒜,Z0,,Zk1)\mathcal{F}_{k}=\sigma(\mathcal{A},Z_{0},\ldots,Z_{k-1})caligraphic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ).
考虑 𝔼[Yk+1|k]=Ykdelimited-[]conditionalsubscript1subscriptsubscript\mathbb{E}[Y_{k+1}|\mathcal{F}_{k}]=Y_{k}blackboard_E [ italic_Y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ,其中 k=σ(𝒜,Z0,,Zk1)subscriptsubscript0subscript1\mathcal{F}_{k}=\sigma(\mathcal{A},Z_{0},\ldots,Z_{k-1})caligraphic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT )
Consider 𝔼[Zk+1|𝒢k]=Zk\mathbb{E}[Z_{k+1}|\mathcal{G}_{k}]=Z_{k}blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | caligraphic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] = italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT where 𝒢k=σ(,Y0,,Yk1)\mathcal{G}_{k}=\sigma(\mathcal{B},Y_{0},\ldots,Y_{k-1})caligraphic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ).
考虑 𝔼[Zk+1|𝒢k]=Zkdelimited-[]conditionalsubscript1subscriptsubscript\mathbb{E}[Z_{k+1}|\mathcal{G}_{k}]=Z_{k}blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | caligraphic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] = italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ,其中 𝒢k=σ(,Y0,,Yk1)subscriptsubscript0subscript1\mathcal{G}_{k}=\sigma(\mathcal{B},Y_{0},\ldots,Y_{k-1})caligraphic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT )
Consider the sigma-algebra σ(𝒜)\sigma(\mathcal{A}\cup\mathcal{B})italic_σ ( caligraphic_A ∪ caligraphic_B ).
考虑σ-代数 σ(𝒜)\sigma(\mathcal{A}\cup\mathcal{B})italic_σ ( caligraphic_A ∪ caligraphic_B )
Let W=𝔼[X|σ(𝒜)]W=\mathbb{E}[X|\sigma(\mathcal{A}\cup\mathcal{B})]italic_W = blackboard_E [ italic_X | italic_σ ( caligraphic_A ∪ caligraphic_B ) ]. We have Y=𝔼[X|σ(𝒜,Z0,Z1,)]Y_{\infty}=\mathbb{E}[X|\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ].  我们已有 Y=𝔼[X|σ(𝒜,Z0,Z1,)]subscriptdelimited-[]conditionalsubscript0subscript1Y_{\infty}=\mathbb{E}[X|\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ] Since ZkZ_{k}italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT depends on Yk1Y_{k-1}italic_Y start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT which depends on Zk2Z_{k-2}italic_Z start_POSTSUBSCRIPT italic_k - 2 end_POSTSUBSCRIPT etc., it is complex.
由于 ZksubscriptZ_{k}italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT 依赖于 Yk1subscript1Y_{k-1}italic_Y start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ,而 Yk1subscript1Y_{k-1}italic_Y start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT 又依赖于 Zk2subscript2Z_{k-2}italic_Z start_POSTSUBSCRIPT italic_k - 2 end_POSTSUBSCRIPT 等等,因此它很复杂。
Let’s consider the case where Y0=𝔼[X|𝒜]Y_{0}=\mathbb{E}[X|\mathcal{A}]italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_A ] and Z0=𝔼[X|]Z_{0}=\mathbb{E}[X|\mathcal{B}]italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_B ].
让我们考虑以下情况: Y0=𝔼[X|𝒜]subscript0delimited-[]conditionalY_{0}=\mathbb{E}[X|\mathcal{A}]italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_A ]Z0=𝔼[X|]subscript0delimited-[]conditionalZ_{0}=\mathbb{E}[X|\mathcal{B}]italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_B ]
Then Y1=𝔼[X|𝒜]=Y0Y_{1}=\mathbb{E}[X|\mathcal{A}]=Y_{0}italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_A ] = italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.  然后 Y1=𝔼[X|𝒜]=Y0subscript1delimited-[]conditionalsubscript0Y_{1}=\mathbb{E}[X|\mathcal{A}]=Y_{0}italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_A ] = italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT Z1=𝔼[X|]=Z0Z_{1}=\mathbb{E}[X|\mathcal{B}]=Z_{0}italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_B ] = italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. If this were the case, then Yk=Y0Y_{k}=Y_{0}italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and Zk=Z0Z_{k}=Z_{0}italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, so Y=Y0Y_{\infty}=Y_{0}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and Z=Z0Z_{\infty}=Z_{0}italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.
如果这种情况成立,那么 Yk=Y0subscriptsubscript0Y_{k}=Y_{0}italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPTZk=Z0subscriptsubscript0Z_{k}=Z_{0}italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ,所以 Y=Y0subscriptsubscript0Y_{\infty}=Y_{0}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPTZ=Z0subscriptsubscript0Z_{\infty}=Z_{0}italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
We need Y0=Z0Y_{0}=Z_{0}italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT a.s., i.e., 𝔼[X|𝒜]=𝔼[X|]\mathbb{E}[X|\mathcal{A}]=\mathbb{E}[X|\mathcal{B}]blackboard_E [ italic_X | caligraphic_A ] = blackboard_E [ italic_X | caligraphic_B ] a.s. This is not generally true.
我们需要 Y0=Z0subscript0subscript0Y_{0}=Z_{0}italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT a.s.,即 𝔼[X|𝒜]=𝔼[X|]delimited-[]conditionaldelimited-[]conditional\mathbb{E}[X|\mathcal{A}]=\mathbb{E}[X|\mathcal{B}]blackboard_E [ italic_X | caligraphic_A ] = blackboard_E [ italic_X | caligraphic_B ] a.s. 这通常是不正确的。
Consider the sigma-algebra generated by all YiY_{i}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and ZiZ_{i}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, denoted by 𝒢=σ(Y0,Z0,Y1,Z1,)\mathcal{G}_{\infty}=\sigma(Y_{0},Z_{0},Y_{1},Z_{1},\ldots)caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ).
考虑由所有 YisubscriptY_{i}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPTZisubscriptZ_{i}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT 生成的σ-代数,记为 𝒢=σ(Y0,Z0,Y1,Z1,)subscriptsubscript0subscript0subscript1subscript1\mathcal{G}_{\infty}=\sigma(Y_{0},Z_{0},Y_{1},Z_{1},\ldots)caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … )
Then 𝒴=σ(𝒜,Z0,Z1,)σ(𝒜,𝒢)\mathcal{Y}_{\infty}=\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)\subseteq\sigma(% \mathcal{A},\mathcal{G}_{\infty})caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ⊆ italic_σ ( caligraphic_A , caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ).  然后 𝒴=σ(𝒜,Z0,Z1,)σ(𝒜,𝒢)subscriptsubscript0subscript1subscript\mathcal{Y}_{\infty}=\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)\subseteq\sigma(% \mathcal{A},\mathcal{G}_{\infty})caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ⊆ italic_σ ( caligraphic_A , caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) 𝒵=σ(,Y0,Y1,)σ(,𝒢)\mathcal{Z}_{\infty}=\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)\subseteq\sigma(% \mathcal{B},\mathcal{G}_{\infty})caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ⊆ italic_σ ( caligraphic_B , caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ).
Consider 𝔼[X|𝒜,Z0,Z1,]=Y\mathbb{E}[X|\mathcal{A},Z_{0},Z_{1},\ldots]=Y_{\infty}blackboard_E [ italic_X | caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ] = italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT. This means for any Aσ(𝒜,Z0,Z1,)A\in\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)italic_A ∈ italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ), 𝔼[X1A]=𝔼[Y1A]\mathbb{E}[X1_{A}]=\mathbb{E}[Y_{\infty}1_{A}]blackboard_E [ italic_X 1 start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ] = blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 1 start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ].
考虑 𝔼[X|𝒜,Z0,Z1,]=Ydelimited-[]conditionalsubscript0subscript1subscript\mathbb{E}[X|\mathcal{A},Z_{0},Z_{1},\ldots]=Y_{\infty}blackboard_E [ italic_X | caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ] = italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 。这意味着对于任何 Aσ(𝒜,Z0,Z1,)subscript0subscript1A\in\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)italic_A ∈ italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … )𝔼[X1A]=𝔼[Y1A]delimited-[]subscript1delimited-[]subscriptsubscript1\mathbb{E}[X1_{A}]=\mathbb{E}[Y_{\infty}1_{A}]blackboard_E [ italic_X 1 start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ] = blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 1 start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ]
Consider 𝔼[X|,Y0,Y1,]=Z\mathbb{E}[X|\mathcal{B},Y_{0},Y_{1},\ldots]=Z_{\infty}blackboard_E [ italic_X | caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ] = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT. This means for any Bσ(,Y0,Y1,)B\in\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)italic_B ∈ italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ), 𝔼[X1B]=𝔼[Z1B]\mathbb{E}[X1_{B}]=\mathbb{E}[Z_{\infty}1_{B}]blackboard_E [ italic_X 1 start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ] = blackboard_E [ italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 1 start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ].
考虑 𝔼[X|,Y0,Y1,]=Zdelimited-[]conditionalsubscript0subscript1subscript\mathbb{E}[X|\mathcal{B},Y_{0},Y_{1},\ldots]=Z_{\infty}blackboard_E [ italic_X | caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ] = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 。这意味着对于任何 Bσ(,Y0,Y1,)subscript0subscript1B\in\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)italic_B ∈ italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … )𝔼[X1B]=𝔼[Z1B]delimited-[]subscript1delimited-[]subscriptsubscript1\mathbb{E}[X1_{B}]=\mathbb{E}[Z_{\infty}1_{B}]blackboard_E [ italic_X 1 start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ] = blackboard_E [ italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 1 start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ]
Let =σ(𝒜,)\mathcal{H}=\sigma(\mathcal{A},\mathcal{B})caligraphic_H = italic_σ ( caligraphic_A , caligraphic_B ). Consider 𝔼[X|]\mathbb{E}[X|\mathcal{H}]blackboard_E [ italic_X | caligraphic_H ].  考虑 𝔼[X|]delimited-[]conditional\mathbb{E}[X|\mathcal{H}]blackboard_E [ italic_X | caligraphic_H ] Consider 𝔼[Y|,Y0,Y1,]=𝔼[𝔼[X|𝒜,Z0,Z1,]|,Y0,Y1,]\mathbb{E}[Y_{\infty}|\mathcal{B},Y_{0},Y_{1},\ldots]=\mathbb{E}[\mathbb{E}[X|% \mathcal{A},Z_{0},Z_{1},\ldots]|\mathcal{B},Y_{0},Y_{1},\ldots]blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ] = blackboard_E [ blackboard_E [ italic_X | caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ] | caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ].  考虑 𝔼[Y|,Y0,Y1,]=𝔼[𝔼[X|𝒜,Z0,Z1,]|,Y0,Y1,]delimited-[]conditionalsubscriptsubscript0subscript1delimited-[]conditionaldelimited-[]conditionalsubscript0subscript1subscript0subscript1\mathbb{E}[Y_{\infty}|\mathcal{B},Y_{0},Y_{1},\ldots]=\mathbb{E}[\mathbb{E}[X|% \mathcal{A},Z_{0},Z_{1},\ldots]|\mathcal{B},Y_{0},Y_{1},\ldots]blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ] = blackboard_E [ blackboard_E [ italic_X | caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ] | caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ] Let’s use the fact that YY_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT is σ(𝒜,Z0,Z1,)\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … )-measurable and ZZ_{\infty}italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT is σ(,Y0,Y1,)\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … )-measurable.
让我们利用事实,即 YsubscriptY_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPTσ(𝒜,Z0,Z1,)subscript0subscript1\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) -可测量的,并且 ZsubscriptZ_{\infty}italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPTσ(,Y0,Y1,)subscript0subscript1\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) -可测量的。
Consider the smallest sigma-algebra containing both σ(𝒜,Z0,Z1,)\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) and σ(,Y0,Y1,)\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ).
考虑包含 σ(𝒜,Z0,Z1,)subscript0subscript1\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … )σ(,Y0,Y1,)subscript0subscript1\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) 的最小 σ-代数。
This is σ(𝒜{Yi}i0{Zi}i0)\sigma(\mathcal{A}\cup\mathcal{B}\cup\{Y_{i}\}_{i\geq 0}\cup\{Z_{i}\}_{i\geq 0})italic_σ ( caligraphic_A ∪ caligraphic_B ∪ { italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ≥ 0 end_POSTSUBSCRIPT ∪ { italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ≥ 0 end_POSTSUBSCRIPT ).  这是 σ(𝒜{Yi}i0{Zi}i0)subscriptsubscript0subscriptsubscript0\sigma(\mathcal{A}\cup\mathcal{B}\cup\{Y_{i}\}_{i\geq 0}\cup\{Z_{i}\}_{i\geq 0})italic_σ ( caligraphic_A ∪ caligraphic_B ∪ { italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ≥ 0 end_POSTSUBSCRIPT ∪ { italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ≥ 0 end_POSTSUBSCRIPT ) Let’s try to show that 𝔼[(YZ)2]=0\mathbb{E}[(Y_{\infty}-Z_{\infty})^{2}]=0blackboard_E [ ( italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT - italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = 0.
让我们尝试证明 𝔼[(YZ)2]=0delimited-[]superscriptsubscriptsubscript20\mathbb{E}[(Y_{\infty}-Z_{\infty})^{2}]=0blackboard_E [ ( italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT - italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = 0
𝔼[(YZ)2]=𝔼[Y2]+𝔼[Z2]2𝔼[YZ]\mathbb{E}[(Y_{\infty}-Z_{\infty})^{2}]=\mathbb{E}[Y_{\infty}^{2}]+\mathbb{E}[% Z_{\infty}^{2}]-2\mathbb{E}[Y_{\infty}Z_{\infty}]blackboard_E [ ( italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT - italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + blackboard_E [ italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] - 2 blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ]. Consider the definition of YkY_{k}italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and ZkZ_{k}italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.
考虑 YksubscriptY_{k}italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPTZksubscriptZ_{k}italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT 的定义。
𝔼[Yk+1|𝒜,Z0,,Zk1]=Yk\mathbb{E}[Y_{k+1}|\mathcal{A},Z_{0},\ldots,Z_{k-1}]=Y_{k}blackboard_E [ italic_Y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. 𝔼[Zk+1|,Y0,,Yk1]=Zk\mathbb{E}[Z_{k+1}|\mathcal{B},Y_{0},\ldots,Y_{k-1}]=Z_{k}blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ] = italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Consider 𝔼[Y|,Y0,Y1,]\mathbb{E}[Y_{\infty}|\mathcal{B},Y_{0},Y_{1},\ldots]blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ].  考虑 𝔼[Y|,Y0,Y1,]delimited-[]conditionalsubscriptsubscript0subscript1\mathbb{E}[Y_{\infty}|\mathcal{B},Y_{0},Y_{1},\ldots]blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ] Since YY_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT is measurable with respect to σ(𝒜,Z0,Z1,)\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ), this does not simplify easily.
由于 YsubscriptY_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPTσ(𝒜,Z0,Z1,)subscript0subscript1\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) 可测,这并不容易简化。
Let’s assume YZY_{\infty}\neq Z_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≠ italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT on a set of positive measure.
设集合上的测度为正的 YZsubscriptsubscriptY_{\infty}\neq Z_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≠ italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT
Consider the case where 𝒜\mathcal{A}caligraphic_A and \mathcal{B}caligraphic_B are trivial.
考虑 𝒜\mathcal{A}caligraphic_A\mathcal{B}caligraphic_B 是平凡的情况。
Then Yk=𝔼[X|σ(Z0,,Zk1)]Y_{k}=\mathbb{E}[X|\sigma(Z_{0},\ldots,Z_{k-1})]italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) ] and Zk=𝔼[X|σ(Y0,,Yk1)]Z_{k}=\mathbb{E}[X|\sigma(Y_{0},\ldots,Y_{k-1})]italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) ].
然后 Yk=𝔼[X|σ(Z0,,Zk1)]subscriptdelimited-[]conditionalsubscript0subscript1Y_{k}=\mathbb{E}[X|\sigma(Z_{0},\ldots,Z_{k-1})]italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) ]Zk=𝔼[X|σ(Y0,,Yk1)]subscriptdelimited-[]conditionalsubscript0subscript1Z_{k}=\mathbb{E}[X|\sigma(Y_{0},\ldots,Y_{k-1})]italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) ]
Let =σ(𝒜{Yi,Zi}i0)\mathcal{H}_{\infty}=\sigma(\mathcal{A}\cup\mathcal{B}\cup\{Y_{i},Z_{i}\}_{i% \geq 0})caligraphic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_A ∪ caligraphic_B ∪ { italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ≥ 0 end_POSTSUBSCRIPT ). Consider 𝔼[X|]\mathbb{E}[X|\mathcal{H}_{\infty}]blackboard_E [ italic_X | caligraphic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ].  考虑 𝔼[X|]delimited-[]conditionalsubscript\mathbb{E}[X|\mathcal{H}_{\infty}]blackboard_E [ italic_X | caligraphic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] We have Y=𝔼[X|σ(𝒜,Z0,Z1,)]Y_{\infty}=\mathbb{E}[X|\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ].  我们已有 Y=𝔼[X|σ(𝒜,Z0,Z1,)]subscriptdelimited-[]conditionalsubscript0subscript1Y_{\infty}=\mathbb{E}[X|\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ] By the properties of conditional expectation, 𝔼[Y|𝒜]=𝔼[𝔼[X|𝒜,Z0,Z1,]|𝒜]\mathbb{E}[Y_{\infty}|\mathcal{A}]=\mathbb{E}[\mathbb{E}[X|\mathcal{A},Z_{0},Z% _{1},\ldots]|\mathcal{A}]blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_A ] = blackboard_E [ blackboard_E [ italic_X | caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ] | caligraphic_A ].
根据条件期望的性质, 𝔼[Y|𝒜]=𝔼[𝔼[X|𝒜,Z0,Z1,]|𝒜]delimited-[]conditionalsubscriptdelimited-[]conditionaldelimited-[]conditionalsubscript0subscript1\mathbb{E}[Y_{\infty}|\mathcal{A}]=\mathbb{E}[\mathbb{E}[X|\mathcal{A},Z_{0},Z% _{1},\ldots]|\mathcal{A}]blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_A ] = blackboard_E [ blackboard_E [ italic_X | caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ] | caligraphic_A ]
Consider the sigma-algebra 𝒟\mathcal{D}caligraphic_D generated by all the random variables involved.
考虑由所有相关随机变量生成的 sigma-代数 𝒟\mathcal{D}caligraphic_D
𝒟=σ(𝒜{Yi}i0{Zi}i0)\mathcal{D}=\sigma(\mathcal{A}\cup\mathcal{B}\cup\{Y_{i}\}_{i\geq 0}\cup\{Z_{i% }\}_{i\geq 0})caligraphic_D = italic_σ ( caligraphic_A ∪ caligraphic_B ∪ { italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ≥ 0 end_POSTSUBSCRIPT ∪ { italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ≥ 0 end_POSTSUBSCRIPT ). Let M=𝔼[X|𝒜]M_{\infty}=\mathbb{E}[X|\mathcal{A}\vee\mathcal{B}]italic_M start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_A ∨ caligraphic_B ]. Consider the limits. As nn\to\inftyitalic_n → ∞, σ(𝒜,Z0,,Zn1)σ(𝒜,Z0,Z1,)\sigma(\mathcal{A},Z_{0},\ldots,Z_{n-1})\uparrow\sigma(\mathcal{A},Z_{0},Z_{1}% ,\ldots)italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) ↑ italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ).
考虑限制。如 nn\to\inftyitalic_n → ∞σ(𝒜,Z0,,Zn1)σ(𝒜,Z0,Z1,)subscript0subscript1subscript0subscript1\sigma(\mathcal{A},Z_{0},\ldots,Z_{n-1})\uparrow\sigma(\mathcal{A},Z_{0},Z_{1}% ,\ldots)italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) ↑ italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … )
As nn\to\inftyitalic_n → ∞, σ(,Y0,,Yn1)σ(,Y0,Y1,)\sigma(\mathcal{B},Y_{0},\ldots,Y_{n-1})\uparrow\sigma(\mathcal{B},Y_{0},Y_{1}% ,\ldots)italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) ↑ italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ).
作为 nn\to\inftyitalic_n → ∞σ(,Y0,,Yn1)σ(,Y0,Y1,)subscript0subscript1subscript0subscript1\sigma(\mathcal{B},Y_{0},\ldots,Y_{n-1})\uparrow\sigma(\mathcal{B},Y_{0},Y_{1}% ,\ldots)italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) ↑ italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … )
Let 𝒴=σ(𝒜,Z0,Z1,)\mathcal{Y}_{\infty}=\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) and 𝒵=σ(,Y0,Y1,)\mathcal{Z}_{\infty}=\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ).
𝒴=σ(𝒜,Z0,Z1,)subscriptsubscript0subscript1\mathcal{Y}_{\infty}=\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … )𝒵=σ(,Y0,Y1,)subscriptsubscript0subscript1\mathcal{Z}_{\infty}=\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … )
We have Y=𝔼[X|𝒴]Y_{\infty}=\mathbb{E}[X|\mathcal{Y}_{\infty}]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] and Z=𝔼[X|𝒵]Z_{\infty}=\mathbb{E}[X|\mathcal{Z}_{\infty}]italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ].
我们有 Y=𝔼[X|𝒴]subscriptdelimited-[]conditionalsubscriptY_{\infty}=\mathbb{E}[X|\mathcal{Y}_{\infty}]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ]Z=𝔼[X|𝒵]subscriptdelimited-[]conditionalsubscriptZ_{\infty}=\mathbb{E}[X|\mathcal{Z}_{\infty}]italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ]
Consider 𝔼[YZ]2=𝔼[(YX)+(XZ)]2\mathbb{E}[Y_{\infty}-Z_{\infty}]^{2}=\mathbb{E}[(Y_{\infty}-X)+(X-Z_{\infty})% ]^{2}blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT - italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = blackboard_E [ ( italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT - italic_X ) + ( italic_X - italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT  考虑 𝔼[YZ]2=𝔼[(YX)+(XZ)]2superscriptdelimited-[]subscriptsubscript2superscriptdelimited-[]subscriptsubscript2\mathbb{E}[Y_{\infty}-Z_{\infty}]^{2}=\mathbb{E}[(Y_{\infty}-X)+(X-Z_{\infty})% ]^{2}blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT - italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = blackboard_E [ ( italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT - italic_X ) + ( italic_X - italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =𝔼[(YX)2]+𝔼[(XZ)2]+2𝔼[(YX)(XZ)]=\mathbb{E}[(Y_{\infty}-X)^{2}]+\mathbb{E}[(X-Z_{\infty})^{2}]+2\mathbb{E}[(Y_% {\infty}-X)(X-Z_{\infty})]= blackboard_E [ ( italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT - italic_X ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + blackboard_E [ ( italic_X - italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + 2 blackboard_E [ ( italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT - italic_X ) ( italic_X - italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) ] =𝔼[(X𝔼[X|𝒴])2]+𝔼[(X𝔼[X|𝒵])2]+2𝔼[YXYZX2+XZ]=\mathbb{E}[(X-\mathbb{E}[X|\mathcal{Y}_{\infty}])^{2}]+\mathbb{E}[(X-\mathbb{% E}[X|\mathcal{Z}_{\infty}])^{2}]+2\mathbb{E}[Y_{\infty}X-Y_{\infty}Z_{\infty}-% X^{2}+XZ_{\infty}]= blackboard_E [ ( italic_X - blackboard_E [ italic_X | caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + blackboard_E [ ( italic_X - blackboard_E [ italic_X | caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + 2 blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT italic_X - italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT - italic_X start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_X italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] =𝔼[X2]𝔼[Y2]+𝔼[X2]𝔼[Z2]+2𝔼[YX]2𝔼[YZ]2𝔼[X2]+2𝔼[XZ]=\mathbb{E}[X^{2}]-\mathbb{E}[Y_{\infty}^{2}]+\mathbb{E}[X^{2}]-\mathbb{E}[Z_{% \infty}^{2}]+2\mathbb{E}[Y_{\infty}X]-2\mathbb{E}[Y_{\infty}Z_{\infty}]-2% \mathbb{E}[X^{2}]+2\mathbb{E}[XZ_{\infty}]= blackboard_E [ italic_X start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] - blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + blackboard_E [ italic_X start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] - blackboard_E [ italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + 2 blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT italic_X ] - 2 blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] - 2 blackboard_E [ italic_X start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + 2 blackboard_E [ italic_X italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] =𝔼[Y2]𝔼[Z2]+2𝔼[Y2]2𝔼[YZ]+2𝔼[Z2]=-\mathbb{E}[Y_{\infty}^{2}]-\mathbb{E}[Z_{\infty}^{2}]+2\mathbb{E}[Y_{\infty}% ^{2}]-2\mathbb{E}[Y_{\infty}Z_{\infty}]+2\mathbb{E}[Z_{\infty}^{2}]= - blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] - blackboard_E [ italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + 2 blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] - 2 blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] + 2 blackboard_E [ italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] =𝔼[Y2]+𝔼[Z2]2𝔼[YZ]=\mathbb{E}[Y_{\infty}^{2}]+\mathbb{E}[Z_{\infty}^{2}]-2\mathbb{E}[Y_{\infty}Z% _{\infty}]= blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + blackboard_E [ italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] - 2 blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] Consider the sigma-algebra 𝒞=σ(𝒜)\mathcal{C}=\sigma(\mathcal{A}\cup\mathcal{B})caligraphic_C = italic_σ ( caligraphic_A ∪ caligraphic_B ).
考虑σ-代数 𝒞=σ(𝒜)\mathcal{C}=\sigma(\mathcal{A}\cup\mathcal{B})caligraphic_C = italic_σ ( caligraphic_A ∪ caligraphic_B )
Let W=𝔼[X|𝒞]W=\mathbb{E}[X|\mathcal{C}]italic_W = blackboard_E [ italic_X | caligraphic_C ]. Consider the limit sigma-algebra generated by all the information.
考虑由所有信息生成的极限σ-代数。
Let =σ(𝒜,,Y0,Z0,Y1,Z1,)\mathcal{F}_{\infty}=\sigma(\mathcal{A},\mathcal{B},Y_{0},Z_{0},Y_{1},Z_{1},\ldots)caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ). Then 𝒴\mathcal{Y}_{\infty}\subseteq\mathcal{F}_{\infty}caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ⊆ caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT and 𝒵\mathcal{Z}_{\infty}\subseteq\mathcal{F}_{\infty}caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ⊆ caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT.
然后 𝒴subscriptsubscript\mathcal{Y}_{\infty}\subseteq\mathcal{F}_{\infty}caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ⊆ caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT𝒵subscriptsubscript\mathcal{Z}_{\infty}\subseteq\mathcal{F}_{\infty}caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ⊆ caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT
We have 𝔼[X|𝒴]=Y\mathbb{E}[X|\mathcal{Y}_{\infty}]=Y_{\infty}blackboard_E [ italic_X | caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT and 𝔼[X|𝒵]=Z\mathbb{E}[X|\mathcal{Z}_{\infty}]=Z_{\infty}blackboard_E [ italic_X | caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT.
我们有 𝔼[X|𝒴]=Ydelimited-[]conditionalsubscriptsubscript\mathbb{E}[X|\mathcal{Y}_{\infty}]=Y_{\infty}blackboard_E [ italic_X | caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT𝔼[X|𝒵]=Zdelimited-[]conditionalsubscriptsubscript\mathbb{E}[X|\mathcal{Z}_{\infty}]=Z_{\infty}blackboard_E [ italic_X | caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT
Consider 𝔼[Y|𝒵]\mathbb{E}[Y_{\infty}|\mathcal{Z}_{\infty}]blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] and 𝔼[Z|𝒴]\mathbb{E}[Z_{\infty}|\mathcal{Y}_{\infty}]blackboard_E [ italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ].
考虑 𝔼[Y|𝒵]delimited-[]conditionalsubscriptsubscript\mathbb{E}[Y_{\infty}|\mathcal{Z}_{\infty}]blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ]𝔼[Z|𝒴]delimited-[]conditionalsubscriptsubscript\mathbb{E}[Z_{\infty}|\mathcal{Y}_{\infty}]blackboard_E [ italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ]
Let =σ(𝒜)\mathcal{H}=\sigma(\mathcal{A}\cup\mathcal{B})caligraphic_H = italic_σ ( caligraphic_A ∪ caligraphic_B ). Consider 𝔼[X|]\mathbb{E}[X|\mathcal{H}]blackboard_E [ italic_X | caligraphic_H ].  考虑 𝔼[X|]delimited-[]conditional\mathbb{E}[X|\mathcal{H}]blackboard_E [ italic_X | caligraphic_H ] Consider the sequence of pairs (Yk,Zk)(Y_{k},Z_{k})( italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ).
考虑序列对 (Yk,Zk)subscriptsubscript(Y_{k},Z_{k})( italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
If the limits are the same, then for any bounded measurable function ffitalic_f, 𝔼[f(Y)]=𝔼[f(Z)]\mathbb{E}[f(Y_{\infty})]=\mathbb{E}[f(Z_{\infty})]blackboard_E [ italic_f ( italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) ] = blackboard_E [ italic_f ( italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) ].
如果极限相同,那么对于任何有界可测函数 ffitalic_f𝔼[f(Y)]=𝔼[f(Z)]delimited-[]subscriptdelimited-[]subscript\mathbb{E}[f(Y_{\infty})]=\mathbb{E}[f(Z_{\infty})]blackboard_E [ italic_f ( italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) ] = blackboard_E [ italic_f ( italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) ]
Consider the sigma-algebra 𝒟=σ(𝒜σ(Y0,Z0,))\mathcal{D}=\sigma(\mathcal{A}\cup\mathcal{B}\cup\sigma(Y_{0},Z_{0},\ldots))caligraphic_D = italic_σ ( caligraphic_A ∪ caligraphic_B ∪ italic_σ ( italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … ) ).
考虑σ-代数 𝒟=σ(𝒜σ(Y0,Z0,))subscript0subscript0\mathcal{D}=\sigma(\mathcal{A}\cup\mathcal{B}\cup\sigma(Y_{0},Z_{0},\ldots))caligraphic_D = italic_σ ( caligraphic_A ∪ caligraphic_B ∪ italic_σ ( italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … ) )
Then both YY_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT and ZZ_{\infty}italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT are measurable with respect to 𝒟\mathcal{D}caligraphic_D.
然后, YsubscriptY_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPTZsubscriptZ_{\infty}italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 都可以相对于 𝒟\mathcal{D}caligraphic_D 进行测量。
Consider 𝔼[(YnZn)2]\mathbb{E}[(Y_{n}-Z_{n})^{2}]blackboard_E [ ( italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ].  考虑 𝔼[(YnZn)2]delimited-[]superscriptsubscriptsubscript2\mathbb{E}[(Y_{n}-Z_{n})^{2}]blackboard_E [ ( italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] Let 𝒢=σ(𝒜)\mathcal{G}=\sigma(\mathcal{A}\cup\mathcal{B})caligraphic_G = italic_σ ( caligraphic_A ∪ caligraphic_B ). Consider W=𝔼[X|𝒢]W=\mathbb{E}[X|\mathcal{G}]italic_W = blackboard_E [ italic_X | caligraphic_G ].  考虑 W=𝔼[X|𝒢]delimited-[]conditionalW=\mathbb{E}[X|\mathcal{G}]italic_W = blackboard_E [ italic_X | caligraphic_G ] Then 𝔼[W|𝒜]=𝔼[X|𝒜]\mathbb{E}[W|\mathcal{A}]=\mathbb{E}[X|\mathcal{A}]blackboard_E [ italic_W | caligraphic_A ] = blackboard_E [ italic_X | caligraphic_A ] and 𝔼[W|]=𝔼[X|]\mathbb{E}[W|\mathcal{B}]=\mathbb{E}[X|\mathcal{B}]blackboard_E [ italic_W | caligraphic_B ] = blackboard_E [ italic_X | caligraphic_B ].
然后 𝔼[W|𝒜]=𝔼[X|𝒜]delimited-[]conditionaldelimited-[]conditional\mathbb{E}[W|\mathcal{A}]=\mathbb{E}[X|\mathcal{A}]blackboard_E [ italic_W | caligraphic_A ] = blackboard_E [ italic_X | caligraphic_A ]𝔼[W|]=𝔼[X|]delimited-[]conditionaldelimited-[]conditional\mathbb{E}[W|\mathcal{B}]=\mathbb{E}[X|\mathcal{B}]blackboard_E [ italic_W | caligraphic_B ] = blackboard_E [ italic_X | caligraphic_B ]
Consider the limit sigma-algebra =σ(𝒜σ(Y0,Z0,))\mathcal{L}=\sigma(\mathcal{A}\cup\mathcal{B}\cup\sigma(Y_{0},Z_{0},\ldots))caligraphic_L = italic_σ ( caligraphic_A ∪ caligraphic_B ∪ italic_σ ( italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … ) ).
考虑极限σ-代数 =σ(𝒜σ(Y0,Z0,))subscript0subscript0\mathcal{L}=\sigma(\mathcal{A}\cup\mathcal{B}\cup\sigma(Y_{0},Z_{0},\ldots))caligraphic_L = italic_σ ( caligraphic_A ∪ caligraphic_B ∪ italic_σ ( italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … ) )
We have Y=𝔼[X|σ(𝒜,Z0,Z1,)]Y_{\infty}=\mathbb{E}[X|\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ] and Z=𝔼[X|σ(,Y0,Y1,)]Z_{\infty}=\mathbb{E}[X|\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)]italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ].
我们有 Y=𝔼[X|σ(𝒜,Z0,Z1,)]subscriptdelimited-[]conditionalsubscript0subscript1Y_{\infty}=\mathbb{E}[X|\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ]Z=𝔼[X|σ(,Y0,Y1,)]subscriptdelimited-[]conditionalsubscript0subscript1Z_{\infty}=\mathbb{E}[X|\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)]italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ]
Let =σ(𝒜,Z0,Z1,)σ(,Y0,Y1,)\mathcal{M}=\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)\vee\sigma(\mathcal{B},Y_{0}% ,Y_{1},\ldots)caligraphic_M = italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ∨ italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ). Consider 𝔼[X|]\mathbb{E}[X|\mathcal{M}]blackboard_E [ italic_X | caligraphic_M ].  考虑 𝔼[X|]delimited-[]conditional\mathbb{E}[X|\mathcal{M}]blackboard_E [ italic_X | caligraphic_M ] Consider the random variable XXitalic_X. Then Y=limn𝔼[X|n]Y_{\infty}=\lim_{n}\mathbb{E}[X|\mathcal{F}_{n}]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = roman_lim start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] and Z=limn𝔼[X|𝒢n]Z_{\infty}=\lim_{n}\mathbb{E}[X|\mathcal{G}_{n}]italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = roman_lim start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT blackboard_E [ italic_X | caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ].
考虑随机变量 XXitalic_X 。然后 Y=limn𝔼[X|n]subscriptsubscriptdelimited-[]conditionalsubscriptY_{\infty}=\lim_{n}\mathbb{E}[X|\mathcal{F}_{n}]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = roman_lim start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ]Z=limn𝔼[X|𝒢n]subscriptsubscriptdelimited-[]conditionalsubscriptZ_{\infty}=\lim_{n}\mathbb{E}[X|\mathcal{G}_{n}]italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = roman_lim start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT blackboard_E [ italic_X | caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ]
Let =σ(𝒜σ(Y0,Z0,))\mathcal{H}_{\infty}=\sigma(\mathcal{A}\cup\mathcal{B}\cup\sigma(Y_{0},Z_{0},% \ldots))caligraphic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_A ∪ caligraphic_B ∪ italic_σ ( italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … ) ). Consider E={ω:Y(ω)Z(ω)}E=\{\omega:Y_{\infty}(\omega)\neq Z_{\infty}(\omega)\}italic_E = { italic_ω : italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_ω ) ≠ italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_ω ) }. Suppose P(E)>0P(E)>0italic_P ( italic_E ) > 0.
考虑 E={ω:Y(ω)Z(ω)}conditional-setsubscriptsubscriptE=\{\omega:Y_{\infty}(\omega)\neq Z_{\infty}(\omega)\}italic_E = { italic_ω : italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_ω ) ≠ italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_ω ) } 。假设 P(E)>00P(E)>0italic_P ( italic_E ) > 0
Consider the conditional expectation with respect to the join of the sigma-algebras.
考虑关于 sigma 代数并集的条件期望。
Let 𝒞=σ(𝒜,,Y0,Z0,)\mathcal{C}_{\infty}=\sigma(\mathcal{A},\mathcal{B},Y_{0},Z_{0},\ldots)caligraphic_C start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … ). Consider 𝔼[X|𝒞]\mathbb{E}[X|\mathcal{C}_{\infty}]blackboard_E [ italic_X | caligraphic_C start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ].  考虑 𝔼[X|𝒞]delimited-[]conditionalsubscript\mathbb{E}[X|\mathcal{C}_{\infty}]blackboard_E [ italic_X | caligraphic_C start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] Then 𝔼[𝔼[X|𝒞]|𝒜,Z0,]=𝔼[X|𝒜,Z0,]=Yn+1\mathbb{E}[\mathbb{E}[X|\mathcal{C}_{\infty}]|\mathcal{A},Z_{0},\ldots]=% \mathbb{E}[X|\mathcal{A},Z_{0},\ldots]=Y_{n+1}blackboard_E [ blackboard_E [ italic_X | caligraphic_C start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] | caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … ] = blackboard_E [ italic_X | caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … ] = italic_Y start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT.  然后 𝔼[𝔼[X|𝒞]|𝒜,Z0,]=𝔼[X|𝒜,Z0,]=Yn+1delimited-[]conditionaldelimited-[]conditionalsubscriptsubscript0delimited-[]conditionalsubscript0subscript1\mathbb{E}[\mathbb{E}[X|\mathcal{C}_{\infty}]|\mathcal{A},Z_{0},\ldots]=% \mathbb{E}[X|\mathcal{A},Z_{0},\ldots]=Y_{n+1}blackboard_E [ blackboard_E [ italic_X | caligraphic_C start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] | caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … ] = blackboard_E [ italic_X | caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … ] = italic_Y start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT And 𝔼[𝔼[X|𝒞]|,Y0,]=𝔼[X|,Y0,]=Zn+1\mathbb{E}[\mathbb{E}[X|\mathcal{C}_{\infty}]|\mathcal{B},Y_{0},\ldots]=% \mathbb{E}[X|\mathcal{B},Y_{0},\ldots]=Z_{n+1}blackboard_E [ blackboard_E [ italic_X | caligraphic_C start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] | caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … ] = blackboard_E [ italic_X | caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … ] = italic_Z start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT. So Y=𝔼[X|σ(𝒜,Z0,Z1,)]Y_{\infty}=\mathbb{E}[X|\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ] and Z=𝔼[X|σ(,Y0,Y1,)]Z_{\infty}=\mathbb{E}[X|\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)]italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ].
所以 Y=𝔼[X|σ(𝒜,Z0,Z1,)]subscriptdelimited-[]conditionalsubscript0subscript1Y_{\infty}=\mathbb{E}[X|\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ]Z=𝔼[X|σ(,Y0,Y1,)]subscriptdelimited-[]conditionalsubscript0subscript1Z_{\infty}=\mathbb{E}[X|\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)]italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ]
Let =σ(𝒜,Z0,Z1,)\mathcal{F}_{\infty}=\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) and 𝒢=σ(,Y0,Y1,)\mathcal{G}_{\infty}=\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ).
=σ(𝒜,Z0,Z1,)subscriptsubscript0subscript1\mathcal{F}_{\infty}=\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … )𝒢=σ(,Y0,Y1,)subscriptsubscript0subscript1\mathcal{G}_{\infty}=\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … )
Consider 𝔼[X|𝒢]\mathbb{E}[X|\mathcal{F}_{\infty}\vee\mathcal{G}_{\infty}]blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∨ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ].  考虑 𝔼[X|𝒢]delimited-[]conditionalsubscriptsubscript\mathbb{E}[X|\mathcal{F}_{\infty}\vee\mathcal{G}_{\infty}]blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∨ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] Then 𝔼[𝔼[X|𝒢]|]=𝔼[X|]=Y\mathbb{E}[\mathbb{E}[X|\mathcal{F}_{\infty}\vee\mathcal{G}_{\infty}]|\mathcal% {F}_{\infty}]=\mathbb{E}[X|\mathcal{F}_{\infty}]=Y_{\infty}blackboard_E [ blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∨ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT.  然后 𝔼[𝔼[X|𝒢]|]=𝔼[X|]=Ydelimited-[]conditionaldelimited-[]conditionalsubscriptsubscriptsubscriptdelimited-[]conditionalsubscriptsubscript\mathbb{E}[\mathbb{E}[X|\mathcal{F}_{\infty}\vee\mathcal{G}_{\infty}]|\mathcal% {F}_{\infty}]=\mathbb{E}[X|\mathcal{F}_{\infty}]=Y_{\infty}blackboard_E [ blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∨ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT And 𝔼[𝔼[X|𝒢]|𝒢]=𝔼[X|𝒢]=Z\mathbb{E}[\mathbb{E}[X|\mathcal{F}_{\infty}\vee\mathcal{G}_{\infty}]|\mathcal% {G}_{\infty}]=\mathbb{E}[X|\mathcal{G}_{\infty}]=Z_{\infty}blackboard_E [ blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∨ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] | caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = blackboard_E [ italic_X | caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT. This means Y=𝔼[X|𝒢]Y_{\infty}=\mathbb{E}[X|\mathcal{F}_{\infty}\vee\mathcal{G}_{\infty}]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∨ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] a.s. and Z=𝔼[X|𝒢]Z_{\infty}=\mathbb{E}[X|\mathcal{F}_{\infty}\vee\mathcal{G}_{\infty}]italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∨ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] a.s.
这意味着 Y=𝔼[X|𝒢]subscriptdelimited-[]conditionalsubscriptsubscriptY_{\infty}=\mathbb{E}[X|\mathcal{F}_{\infty}\vee\mathcal{G}_{\infty}]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∨ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] a.s. 和 Z=𝔼[X|𝒢]subscriptdelimited-[]conditionalsubscriptsubscriptZ_{\infty}=\mathbb{E}[X|\mathcal{F}_{\infty}\vee\mathcal{G}_{\infty}]italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∨ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] a.s.
Therefore, Y=ZY_{\infty}=Z_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT a.s.   因此, Y=ZsubscriptsubscriptY_{\infty}=Z_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT a.s. Solution:  解决方案: (a) For the sequence YkY_{k}italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, the σ\sigmaitalic_σ-algebras k=σ(𝒜,Z0,,Zk1)\mathcal{F}_{k}=\sigma(\mathcal{A},Z_{0},\ldots,Z_{k-1})caligraphic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) are increasing as kkitalic_k increases. Since XXitalic_X is integrable, by the Martingale Convergence Theorem for conditional expectations, the sequence Yk=𝔼[X|k]Y_{k}=\mathbb{E}[X|\mathcal{F}_{k}]italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] converges almost surely and in L1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT to Y=𝔼[X|σ(𝒜,Z0,Z1,)]Y_{\infty}=\mathbb{E}[X|\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ]. Similarly, for the sequence ZkZ_{k}italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, the σ\sigmaitalic_σ-algebras 𝒢k=σ(,Y0,,Yk1)\mathcal{G}_{k}=\sigma(\mathcal{B},Y_{0},\ldots,Y_{k-1})caligraphic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) are increasing. Since XXitalic_X is integrable, by the Martingale Convergence Theorem, the sequence Zk=𝔼[X|𝒢k]Z_{k}=\mathbb{E}[X|\mathcal{G}_{k}]italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] converges almost surely and in L1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT to Z=𝔼[X|σ(,Y0,Y1,)]Z_{\infty}=\mathbb{E}[X|\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)]italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ].
(a)对于序列 YksubscriptY_{k}italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ,随着 kkitalic_k 的增加, σ\sigmaitalic_σ -代数 k=σ(𝒜,Z0,,Zk1)subscriptsubscript0subscript1\mathcal{F}_{k}=\sigma(\mathcal{A},Z_{0},\ldots,Z_{k-1})caligraphic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) 是递增的。由于 XXitalic_X 是可积的,根据条件期望的鞅收敛定理,序列 Yk=𝔼[X|k]subscriptdelimited-[]conditionalsubscriptY_{k}=\mathbb{E}[X|\mathcal{F}_{k}]italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] 几乎处处且在 L1superscript1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 中收敛到 Y=𝔼[X|σ(𝒜,Z0,Z1,)]subscriptdelimited-[]conditionalsubscript0subscript1Y_{\infty}=\mathbb{E}[X|\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ] 。类似地,对于序列 ZksubscriptZ_{k}italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPTσ\sigmaitalic_σ -代数 𝒢k=σ(,Y0,,Yk1)subscriptsubscript0subscript1\mathcal{G}_{k}=\sigma(\mathcal{B},Y_{0},\ldots,Y_{k-1})caligraphic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) 是递增的。由于 XXitalic_X 是可积的,根据鞅收敛定理,序列 Zk=𝔼[X|𝒢k]subscriptdelimited-[]conditionalsubscriptZ_{k}=\mathbb{E}[X|\mathcal{G}_{k}]italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] 几乎处处且在 L1superscript1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 中收敛到 Z=𝔼[X|σ(,Y0,Y1,)]subscriptdelimited-[]conditionalsubscript0subscript1Z_{\infty}=\mathbb{E}[X|\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)]italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ]
(b) Let =σ(𝒜,Z0,Z1,)\mathcal{F}_{\infty}=\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) and 𝒢=σ(,Y0,Y1,)\mathcal{G}_{\infty}=\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ). We have Y=𝔼[X|]Y_{\infty}=\mathbb{E}[X|\mathcal{F}_{\infty}]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] and Z=𝔼[X|𝒢]Z_{\infty}=\mathbb{E}[X|\mathcal{G}_{\infty}]italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ]. Consider the σ\sigmaitalic_σ-algebra =𝒢=σ(𝒜,,Y0,Z0,Y1,Z1,)\mathcal{H}=\mathcal{F}_{\infty}\vee\mathcal{G}_{\infty}=\sigma(\mathcal{A},% \mathcal{B},Y_{0},Z_{0},Y_{1},Z_{1},\ldots)caligraphic_H = caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∨ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ). Consider W=𝔼[X|]W=\mathbb{E}[X|\mathcal{H}]italic_W = blackboard_E [ italic_X | caligraphic_H ].
(b) 设 =σ(𝒜,Z0,Z1,)subscriptsubscript0subscript1\mathcal{F}_{\infty}=\sigma(\mathcal{A},Z_{0},Z_{1},\ldots)caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … )𝒢=σ(,Y0,Y1,)subscriptsubscript0subscript1\mathcal{G}_{\infty}=\sigma(\mathcal{B},Y_{0},Y_{1},\ldots)caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) 。我们有 Y=𝔼[X|]subscriptdelimited-[]conditionalsubscriptY_{\infty}=\mathbb{E}[X|\mathcal{F}_{\infty}]italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ]Z=𝔼[X|𝒢]subscriptdelimited-[]conditionalsubscriptZ_{\infty}=\mathbb{E}[X|\mathcal{G}_{\infty}]italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = blackboard_E [ italic_X | caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] 。考虑代数 σ\sigmaitalic_σ - =𝒢=σ(𝒜,,Y0,Z0,Y1,Z1,)subscriptsubscriptsubscript0subscript0subscript1subscript1\mathcal{H}=\mathcal{F}_{\infty}\vee\mathcal{G}_{\infty}=\sigma(\mathcal{A},% \mathcal{B},Y_{0},Z_{0},Y_{1},Z_{1},\ldots)caligraphic_H = caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∨ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_σ ( caligraphic_A , caligraphic_B , italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) 。考虑 W=𝔼[X|]delimited-[]conditionalW=\mathbb{E}[X|\mathcal{H}]italic_W = blackboard_E [ italic_X | caligraphic_H ]
We have \mathcal{F}_{\infty}\subseteq\mathcal{H}caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ⊆ caligraphic_H, so 𝔼[W|]=𝔼[𝔼[X|]|]=𝔼[X|]=Y\mathbb{E}[W|\mathcal{F}_{\infty}]=\mathbb{E}[\mathbb{E}[X|\mathcal{H}]|% \mathcal{F}_{\infty}]=\mathbb{E}[X|\mathcal{F}_{\infty}]=Y_{\infty}blackboard_E [ italic_W | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = blackboard_E [ blackboard_E [ italic_X | caligraphic_H ] | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT.
我们有 subscript\mathcal{F}_{\infty}\subseteq\mathcal{H}caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ⊆ caligraphic_H ,因此 𝔼[W|]=𝔼[𝔼[X|]|]=𝔼[X|]=Ydelimited-[]conditionalsubscriptdelimited-[]conditionaldelimited-[]conditionalsubscriptdelimited-[]conditionalsubscriptsubscript\mathbb{E}[W|\mathcal{F}_{\infty}]=\mathbb{E}[\mathbb{E}[X|\mathcal{H}]|% \mathcal{F}_{\infty}]=\mathbb{E}[X|\mathcal{F}_{\infty}]=Y_{\infty}blackboard_E [ italic_W | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = blackboard_E [ blackboard_E [ italic_X | caligraphic_H ] | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT
Since WWitalic_W is \mathcal{H}caligraphic_H-measurable, 𝔼[W|]=Y\mathbb{E}[W|\mathcal{F}_{\infty}]=Y_{\infty}blackboard_E [ italic_W | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT implies that YY_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT is the conditional expectation of WWitalic_W given \mathcal{F}_{\infty}caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT. But since YY_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT is already \mathcal{F}_{\infty}caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-measurable, we have 𝔼[W|]=W\mathbb{E}[W|\mathcal{F}_{\infty}]=Wblackboard_E [ italic_W | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_W if WWitalic_W is \mathcal{F}_{\infty}caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-measurable, which is not necessarily true.
由于 WWitalic_W\mathcal{H}caligraphic_H 可测的, 𝔼[W|]=Ydelimited-[]conditionalsubscriptsubscript\mathbb{E}[W|\mathcal{F}_{\infty}]=Y_{\infty}blackboard_E [ italic_W | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 意味着 YsubscriptY_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 是给定 subscript\mathcal{F}_{\infty}caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPTWWitalic_W 的条件期望。但是,由于 YsubscriptY_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 已经是 subscript\mathcal{F}_{\infty}caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 可测的,如果 WWitalic_Wsubscript\mathcal{F}_{\infty}caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 可测的,则有 𝔼[W|]=Wdelimited-[]conditionalsubscript\mathbb{E}[W|\mathcal{F}_{\infty}]=Wblackboard_E [ italic_W | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_W ,这并不一定成立。
However, we know that 𝔼[Y|]=Y\mathbb{E}[Y_{\infty}|\mathcal{F}_{\infty}]=Y_{\infty}blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT and 𝔼[Z|𝒢]=Z\mathbb{E}[Z_{\infty}|\mathcal{G}_{\infty}]=Z_{\infty}blackboard_E [ italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT.
然而,我们知道 𝔼[Y|]=Ydelimited-[]conditionalsubscriptsubscriptsubscript\mathbb{E}[Y_{\infty}|\mathcal{F}_{\infty}]=Y_{\infty}blackboard_E [ italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT𝔼[Z|𝒢]=Zdelimited-[]conditionalsubscriptsubscriptsubscript\mathbb{E}[Z_{\infty}|\mathcal{G}_{\infty}]=Z_{\infty}blackboard_E [ italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT
Consider 𝔼[X|𝒢]\mathbb{E}[X|\mathcal{F}_{\infty}\vee\mathcal{G}_{\infty}]blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∨ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ].  考虑 𝔼[X|𝒢]delimited-[]conditionalsubscriptsubscript\mathbb{E}[X|\mathcal{F}_{\infty}\vee\mathcal{G}_{\infty}]blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∨ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] Then 𝔼[𝔼[X|𝒢]|]=𝔼[X|]=Y\mathbb{E}[\mathbb{E}[X|\mathcal{F}_{\infty}\vee\mathcal{G}_{\infty}]|\mathcal% {F}_{\infty}]=\mathbb{E}[X|\mathcal{F}_{\infty}]=Y_{\infty}blackboard_E [ blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∨ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT.  然后 𝔼[𝔼[X|𝒢]|]=𝔼[X|]=Ydelimited-[]conditionaldelimited-[]conditionalsubscriptsubscriptsubscriptdelimited-[]conditionalsubscriptsubscript\mathbb{E}[\mathbb{E}[X|\mathcal{F}_{\infty}\vee\mathcal{G}_{\infty}]|\mathcal% {F}_{\infty}]=\mathbb{E}[X|\mathcal{F}_{\infty}]=Y_{\infty}blackboard_E [ blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∨ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT This implies that 𝔼[X|𝒢]=Y\mathbb{E}[X|\mathcal{F}_{\infty}\vee\mathcal{G}_{\infty}]=Y_{\infty}blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∨ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT almost surely.
这几乎肯定意味着 𝔼[X|𝒢]=Ydelimited-[]conditionalsubscriptsubscriptsubscript\mathbb{E}[X|\mathcal{F}_{\infty}\vee\mathcal{G}_{\infty}]=Y_{\infty}blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∨ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT
Similarly, 𝔼[𝔼[X|𝒢]|𝒢]=𝔼[X|𝒢]=Z\mathbb{E}[\mathbb{E}[X|\mathcal{F}_{\infty}\vee\mathcal{G}_{\infty}]|\mathcal% {G}_{\infty}]=\mathbb{E}[X|\mathcal{G}_{\infty}]=Z_{\infty}blackboard_E [ blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∨ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] | caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = blackboard_E [ italic_X | caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT.  同样, 𝔼[𝔼[X|𝒢]|𝒢]=𝔼[X|𝒢]=Zdelimited-[]conditionaldelimited-[]conditionalsubscriptsubscriptsubscriptdelimited-[]conditionalsubscriptsubscript\mathbb{E}[\mathbb{E}[X|\mathcal{F}_{\infty}\vee\mathcal{G}_{\infty}]|\mathcal% {G}_{\infty}]=\mathbb{E}[X|\mathcal{G}_{\infty}]=Z_{\infty}blackboard_E [ blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∨ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] | caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = blackboard_E [ italic_X | caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT This implies that 𝔼[X|𝒢]=Z\mathbb{E}[X|\mathcal{F}_{\infty}\vee\mathcal{G}_{\infty}]=Z_{\infty}blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∨ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT almost surely.
这几乎肯定意味着 𝔼[X|𝒢]=Zdelimited-[]conditionalsubscriptsubscriptsubscript\mathbb{E}[X|\mathcal{F}_{\infty}\vee\mathcal{G}_{\infty}]=Z_{\infty}blackboard_E [ italic_X | caligraphic_F start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∨ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT
Therefore, Y=ZY_{\infty}=Z_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT almost surely.
因此, Y=ZsubscriptsubscriptY_{\infty}=Z_{\infty}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT 几乎肯定。
Final Answer: The final answer is Y=Z\boxed{Y_{\infty}=Z_{\infty}}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT
最终答案:最终答案为 Y=Zsubscriptsubscript\boxed{Y_{\infty}=Z_{\infty}}italic_Y start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT
Table 10: s1K sample stemming from s1-teasers. The black text is the question and the yellow text is the solution. Based solely on the question we generate the light blue reasoning trace and blue response with Gemini, which we use for training.
表 10:来自 s1-teasers 的 s1K 样本。黑色文字是问题,黄色文字是解决方案。仅基于问题,我们使用 Gemini 生成浅蓝色推理轨迹和蓝色响应,这是我们用于训练的内容。
Alice and Bob each have a coin and flip it until they get a heads. If Bob flipped his coin more times than Alice, what is the expected number of times Alice flipped her coin?
Alice 和 Bob 各自有一枚硬币,他们轮流抛掷直到出现正面。如果 Bob 抛掷硬币的次数比 Alice 多,Alice 抛掷硬币的期望次数是多少?
4/3
The problem asks for the expected number of times Alice flipped her coin, given that Bob flipped his coin more times than Alice.
问题要求在 Bob 掷硬币次数多于 Alice 的情况下,求 Alice 掷硬币的期望次数。
Let XXitalic_X be the number of flips Alice makes until she gets a head.
XXitalic_X 为爱丽丝得到正面之前翻转的次数。
Let YYitalic_Y be the number of flips Bob makes until he gets a head.
YYitalic_Y 为鲍勃得到正面所需的翻转次数。
XXitalic_X and YYitalic_Y are independent random variables following a geometric distribution with parameter p=0.5p=0.5italic_p = 0.5 (probability of getting a head).
XXitalic_XYYitalic_Y 是独立随机变量,服从参数为 p=0.50.5p=0.5italic_p = 0.5 (得到正面的概率)的几何分布。
The probability mass function for a geometric distribution is P(K=k)=(1p)k1pP(K=k)=(1-p)^{k-1}pitalic_P ( italic_K = italic_k ) = ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT italic_p, for k=1,2,k=1,2,\dotsitalic_k = 1 , 2 , ….
几何分布的概率质量函数为 P(K=k)=(1p)k1psuperscript11P(K=k)=(1-p)^{k-1}pitalic_P ( italic_K = italic_k ) = ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT italic_p ,对于 k=1,2,12k=1,2,\dotsitalic_k = 1 , 2 , …
In this case, p=0.5p=0.5italic_p = 0.5, so P(K=k)=(0.5)k1(0.5)=(0.5)k=12kP(K=k)=(0.5)^{k-1}(0.5)=(0.5)^{k}=\frac{1}{2^{k}}italic_P ( italic_K = italic_k ) = ( 0.5 ) start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ( 0.5 ) = ( 0.5 ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG.
在这种情况下, p=0.50.5p=0.5italic_p = 0.5 ,因此 P(K=k)=(0.5)k1(0.5)=(0.5)k=12ksuperscript0.510.5superscript0.51superscript2P(K=k)=(0.5)^{k-1}(0.5)=(0.5)^{k}=\frac{1}{2^{k}}italic_P ( italic_K = italic_k ) = ( 0.5 ) start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ( 0.5 ) = ( 0.5 ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG
We are given the condition that Bob flipped his coin more times than Alice, which means Y>XY>Xitalic_Y > italic_X.
我们得到条件,Bob 掷硬币的次数比 Alice 多,这意味着 Y>XY>Xitalic_Y > italic_X
We want to find the expected value of XXitalic_X given that Y>XY>Xitalic_Y > italic_X, i.e., E[X|Y>X]E[X|Y>X]italic_E [ italic_X | italic_Y > italic_X ].
我们希望找到在给定 Y>XY>Xitalic_Y > italic_X 的条件下, XXitalic_X 的期望值,即 E[X|Y>X]delimited-[]ketE[X|Y>X]italic_E [ italic_X | italic_Y > italic_X ]
The formula for conditional expectation is E[X|A]=xxP(X=x|A)E[X|A]=\sum_{x}xP(X=x|A)italic_E [ italic_X | italic_A ] = ∑ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_x italic_P ( italic_X = italic_x | italic_A ).
条件期望的公式为 E[X|A]=xxP(X=x|A)delimited-[]conditionalsubscriptconditionalE[X|A]=\sum_{x}xP(X=x|A)italic_E [ italic_X | italic_A ] = ∑ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_x italic_P ( italic_X = italic_x | italic_A )
Also, P(X=x|A)=P(X=xA)P(A)P(X=x|A)=\frac{P(X=x\cap A)}{P(A)}italic_P ( italic_X = italic_x | italic_A ) = divide start_ARG italic_P ( italic_X = italic_x ∩ italic_A ) end_ARG start_ARG italic_P ( italic_A ) end_ARG.  此外, P(X=x|A)=P(X=xA)P(A)conditionalP(X=x|A)=\frac{P(X=x\cap A)}{P(A)}italic_P ( italic_X = italic_x | italic_A ) = divide start_ARG italic_P ( italic_X = italic_x ∩ italic_A ) end_ARG start_ARG italic_P ( italic_A ) end_ARG In our case, AAitalic_A is the event Y>XY>Xitalic_Y > italic_X.
在我们的情况下, AAitalic_A 是事件 Y>XY>Xitalic_Y > italic_X
We need to find P(Y>X)P(Y>X)italic_P ( italic_Y > italic_X ).
我们需要找到 P(Y>X)P(Y>X)italic_P ( italic_Y > italic_X )
P(Y>X)=x=1y=x+1P(X=x,Y=y)P(Y>X)=\sum_{x=1}^{\infty}\sum_{y=x+1}^{\infty}P(X=x,Y=y)italic_P ( italic_Y > italic_X ) = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_y = italic_x + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_P ( italic_X = italic_x , italic_Y = italic_y ) Since XXitalic_X and YYitalic_Y are independent, P(X=x,Y=y)=P(X=x)P(Y=y)=12x12y=12x+yP(X=x,Y=y)=P(X=x)P(Y=y)=\frac{1}{2^{x}}\frac{1}{2^{y}}=\frac{1}{2^{x+y}}italic_P ( italic_X = italic_x , italic_Y = italic_y ) = italic_P ( italic_X = italic_x ) italic_P ( italic_Y = italic_y ) = divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT end_ARG = divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_x + italic_y end_POSTSUPERSCRIPT end_ARG.
由于 XXitalic_XYYitalic_Y 是独立的, P(X=x,Y=y)=P(X=x)P(Y=y)=12x12y=12x+yformulae-sequence1superscript21superscript21superscript2P(X=x,Y=y)=P(X=x)P(Y=y)=\frac{1}{2^{x}}\frac{1}{2^{y}}=\frac{1}{2^{x+y}}italic_P ( italic_X = italic_x , italic_Y = italic_y ) = italic_P ( italic_X = italic_x ) italic_P ( italic_Y = italic_y ) = divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT end_ARG = divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_x + italic_y end_POSTSUPERSCRIPT end_ARG
P(Y>X)=x=1y=x+112x+y=x=112xy=x+112yP(Y>X)=\sum_{x=1}^{\infty}\sum_{y=x+1}^{\infty}\frac{1}{2^{x+y}}=\sum_{x=1}^{% \infty}\frac{1}{2^{x}}\sum_{y=x+1}^{\infty}\frac{1}{2^{y}}italic_P ( italic_Y > italic_X ) = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_y = italic_x + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_x + italic_y end_POSTSUPERSCRIPT end_ARG = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_y = italic_x + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT end_ARG The inner sum is a geometric series: y=x+1(12)y=(12)x+1+(12)x+2+\sum_{y=x+1}^{\infty}(\frac{1}{2})^{y}=(\frac{1}{2})^{x+1}+(\frac{1}{2})^{x+2}+\dots∑ start_POSTSUBSCRIPT italic_y = italic_x + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT = ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x + 1 end_POSTSUPERSCRIPT + ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x + 2 end_POSTSUPERSCRIPT + …
内和是一个几何级数: y=x+1(12)y=(12)x+1+(12)x+2+superscriptsubscript1superscript12superscript121superscript122\sum_{y=x+1}^{\infty}(\frac{1}{2})^{y}=(\frac{1}{2})^{x+1}+(\frac{1}{2})^{x+2}+\dots∑ start_POSTSUBSCRIPT italic_y = italic_x + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT = ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x + 1 end_POSTSUPERSCRIPT + ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x + 2 end_POSTSUPERSCRIPT + …
This is a geometric series with first term a=(12)x+1a=(\frac{1}{2})^{x+1}italic_a = ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x + 1 end_POSTSUPERSCRIPT and common ratio r=12r=\frac{1}{2}italic_r = divide start_ARG 1 end_ARG start_ARG 2 end_ARG.
这是一个首项为 a=(12)x+1superscript121a=(\frac{1}{2})^{x+1}italic_a = ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x + 1 end_POSTSUPERSCRIPT ,公比为 r=1212r=\frac{1}{2}italic_r = divide start_ARG 1 end_ARG start_ARG 2 end_ARG 的几何级数。
The sum is a1r=(12)x+1112=(12)x+112=(12)x\frac{a}{1-r}=\frac{(\frac{1}{2})^{x+1}}{1-\frac{1}{2}}=\frac{(\frac{1}{2})^{x% +1}}{\frac{1}{2}}=(\frac{1}{2})^{x}divide start_ARG italic_a end_ARG start_ARG 1 - italic_r end_ARG = divide start_ARG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x + 1 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_ARG = divide start_ARG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x + 1 end_POSTSUPERSCRIPT end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_ARG = ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT.
总和为 a1r=(12)x+1112=(12)x+112=(12)x1superscript121112superscript12112superscript12\frac{a}{1-r}=\frac{(\frac{1}{2})^{x+1}}{1-\frac{1}{2}}=\frac{(\frac{1}{2})^{x% +1}}{\frac{1}{2}}=(\frac{1}{2})^{x}divide start_ARG italic_a end_ARG start_ARG 1 - italic_r end_ARG = divide start_ARG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x + 1 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_ARG = divide start_ARG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x + 1 end_POSTSUPERSCRIPT end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_ARG = ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT
So, P(Y>X)=x=112x(12)x=x=1(14)xP(Y>X)=\sum_{x=1}^{\infty}\frac{1}{2^{x}}(\frac{1}{2})^{x}=\sum_{x=1}^{\infty}% (\frac{1}{4})^{x}italic_P ( italic_Y > italic_X ) = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT.  因此, P(Y>X)=x=112x(12)x=x=1(14)xsuperscriptsubscript11superscript2superscript12superscriptsubscript1superscript14P(Y>X)=\sum_{x=1}^{\infty}\frac{1}{2^{x}}(\frac{1}{2})^{x}=\sum_{x=1}^{\infty}% (\frac{1}{4})^{x}italic_P ( italic_Y > italic_X ) = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT This is another geometric series with first term a=14a=\frac{1}{4}italic_a = divide start_ARG 1 end_ARG start_ARG 4 end_ARG and common ratio r=14r=\frac{1}{4}italic_r = divide start_ARG 1 end_ARG start_ARG 4 end_ARG.
这是另一个首项为 a=1414a=\frac{1}{4}italic_a = divide start_ARG 1 end_ARG start_ARG 4 end_ARG ,公比为 r=1414r=\frac{1}{4}italic_r = divide start_ARG 1 end_ARG start_ARG 4 end_ARG 的几何级数。
The sum is a1r=14114=1434=13\frac{a}{1-r}=\frac{\frac{1}{4}}{1-\frac{1}{4}}=\frac{\frac{1}{4}}{\frac{3}{4}% }=\frac{1}{3}divide start_ARG italic_a end_ARG start_ARG 1 - italic_r end_ARG = divide start_ARG divide start_ARG 1 end_ARG start_ARG 4 end_ARG end_ARG start_ARG 1 - divide start_ARG 1 end_ARG start_ARG 4 end_ARG end_ARG = divide start_ARG divide start_ARG 1 end_ARG start_ARG 4 end_ARG end_ARG start_ARG divide start_ARG 3 end_ARG start_ARG 4 end_ARG end_ARG = divide start_ARG 1 end_ARG start_ARG 3 end_ARG.
总和为 a1r=14114=1434=13114114143413\frac{a}{1-r}=\frac{\frac{1}{4}}{1-\frac{1}{4}}=\frac{\frac{1}{4}}{\frac{3}{4}% }=\frac{1}{3}divide start_ARG italic_a end_ARG start_ARG 1 - italic_r end_ARG = divide start_ARG divide start_ARG 1 end_ARG start_ARG 4 end_ARG end_ARG start_ARG 1 - divide start_ARG 1 end_ARG start_ARG 4 end_ARG end_ARG = divide start_ARG divide start_ARG 1 end_ARG start_ARG 4 end_ARG end_ARG start_ARG divide start_ARG 3 end_ARG start_ARG 4 end_ARG end_ARG = divide start_ARG 1 end_ARG start_ARG 3 end_ARG
Now, we need to find P(X=xY>X)=P(X=x,Y>x)P(X=x\cap Y>X)=P(X=x,Y>x)italic_P ( italic_X = italic_x ∩ italic_Y > italic_X ) = italic_P ( italic_X = italic_x , italic_Y > italic_x ).
现在,我们需要找到 P(X=xY>X)=P(X=x,Y>x)formulae-sequenceP(X=x\cap Y>X)=P(X=x,Y>x)italic_P ( italic_X = italic_x ∩ italic_Y > italic_X ) = italic_P ( italic_X = italic_x , italic_Y > italic_x )
P(X=x,Y>x)=P(X=x)P(Y>x)P(X=x,Y>x)=P(X=x)P(Y>x)italic_P ( italic_X = italic_x , italic_Y > italic_x ) = italic_P ( italic_X = italic_x ) italic_P ( italic_Y > italic_x ), due to independence.
P(X=x,Y>x)=P(X=x)P(Y>x)formulae-sequenceP(X=x,Y>x)=P(X=x)P(Y>x)italic_P ( italic_X = italic_x , italic_Y > italic_x ) = italic_P ( italic_X = italic_x ) italic_P ( italic_Y > italic_x ) ,由于独立性。
P(Y>x)=y=x+1P(Y=y)=y=x+112y=(12)x+1+(12)x+2+=(12)xP(Y>x)=\sum_{y=x+1}^{\infty}P(Y=y)=\sum_{y=x+1}^{\infty}\frac{1}{2^{y}}=(\frac% {1}{2})^{x+1}+(\frac{1}{2})^{x+2}+\dots=(\frac{1}{2})^{x}italic_P ( italic_Y > italic_x ) = ∑ start_POSTSUBSCRIPT italic_y = italic_x + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_P ( italic_Y = italic_y ) = ∑ start_POSTSUBSCRIPT italic_y = italic_x + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT end_ARG = ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x + 1 end_POSTSUPERSCRIPT + ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x + 2 end_POSTSUPERSCRIPT + ⋯ = ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT. So, P(X=x,Y>x)=12x×(12)x=(14)xP(X=x,Y>x)=\frac{1}{2^{x}}\times(\frac{1}{2})^{x}=(\frac{1}{4})^{x}italic_P ( italic_X = italic_x , italic_Y > italic_x ) = divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG × ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT.  因此, P(X=x,Y>x)=12x×(12)x=(14)xformulae-sequence1superscript2superscript12superscript14P(X=x,Y>x)=\frac{1}{2^{x}}\times(\frac{1}{2})^{x}=(\frac{1}{4})^{x}italic_P ( italic_X = italic_x , italic_Y > italic_x ) = divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG × ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT Now, we can find P(X=x|Y>X)=P(X=xY>X)P(Y>X)=(14)x13=3(14)xP(X=x|Y>X)=\frac{P(X=x\cap Y>X)}{P(Y>X)}=\frac{(\frac{1}{4})^{x}}{\frac{1}{3}}% =3(\frac{1}{4})^{x}italic_P ( italic_X = italic_x | italic_Y > italic_X ) = divide start_ARG italic_P ( italic_X = italic_x ∩ italic_Y > italic_X ) end_ARG start_ARG italic_P ( italic_Y > italic_X ) end_ARG = divide start_ARG ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG 3 end_ARG end_ARG = 3 ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT.
现在,我们可以找到 P(X=x|Y>X)=P(X=xY>X)P(Y>X)=(14)x13=3(14)xketsuperscript14133superscript14P(X=x|Y>X)=\frac{P(X=x\cap Y>X)}{P(Y>X)}=\frac{(\frac{1}{4})^{x}}{\frac{1}{3}}% =3(\frac{1}{4})^{x}italic_P ( italic_X = italic_x | italic_Y > italic_X ) = divide start_ARG italic_P ( italic_X = italic_x ∩ italic_Y > italic_X ) end_ARG start_ARG italic_P ( italic_Y > italic_X ) end_ARG = divide start_ARG ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG 3 end_ARG end_ARG = 3 ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT
Finally, we can calculate the conditional expectation:
最后,我们可以计算条件期望:
E[X|Y>X]=x=1xP(X=x|Y>X)=x=1x3(14)x=3x=1x(14)xE[X|Y>X]=\sum_{x=1}^{\infty}xP(X=x|Y>X)=\sum_{x=1}^{\infty}x\cdot 3(\frac{1}{4% })^{x}=3\sum_{x=1}^{\infty}x(\frac{1}{4})^{x}italic_E [ italic_X | italic_Y > italic_X ] = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x italic_P ( italic_X = italic_x | italic_Y > italic_X ) = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x ⋅ 3 ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = 3 ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT. Consider the series S=x=1xrxS=\sum_{x=1}^{\infty}xr^{x}italic_S = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x italic_r start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT, where |r|<1|r|<1| italic_r | < 1.
考虑序列 S=x=1xrxsuperscriptsubscript1superscriptS=\sum_{x=1}^{\infty}xr^{x}italic_S = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x italic_r start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ,其中 |r|<11|r|<1| italic_r | < 1
S=r+2r2+3r3+4r4+S=r+2r^{2}+3r^{3}+4r^{4}+\dotsitalic_S = italic_r + 2 italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 3 italic_r start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + 4 italic_r start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT + … Sr=1+2r+3r2+4r3+\frac{S}{r}=1+2r+3r^{2}+4r^{3}+\dotsdivide start_ARG italic_S end_ARG start_ARG italic_r end_ARG = 1 + 2 italic_r + 3 italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 4 italic_r start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + … Consider the geometric series G=1+r+r2+r3+=11rG=1+r+r^{2}+r^{3}+\dots=\frac{1}{1-r}italic_G = 1 + italic_r + italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_r start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + ⋯ = divide start_ARG 1 end_ARG start_ARG 1 - italic_r end_ARG.
考虑几何级数 G=1+r+r2+r3+=11r1superscript2superscript311G=1+r+r^{2}+r^{3}+\dots=\frac{1}{1-r}italic_G = 1 + italic_r + italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_r start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + ⋯ = divide start_ARG 1 end_ARG start_ARG 1 - italic_r end_ARG
Differentiating with respect to rritalic_r:
rritalic_r 求导
dGdr=1+2r+3r2+4r3+=ddr(11r)=1(1r)2\frac{dG}{dr}=1+2r+3r^{2}+4r^{3}+\dots=\frac{d}{dr}(\frac{1}{1-r})=\frac{1}{(1% -r)^{2}}divide start_ARG italic_d italic_G end_ARG start_ARG italic_d italic_r end_ARG = 1 + 2 italic_r + 3 italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 4 italic_r start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + ⋯ = divide start_ARG italic_d end_ARG start_ARG italic_d italic_r end_ARG ( divide start_ARG 1 end_ARG start_ARG 1 - italic_r end_ARG ) = divide start_ARG 1 end_ARG start_ARG ( 1 - italic_r ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG. So, Sr=1(1r)2\frac{S}{r}=\frac{1}{(1-r)^{2}}divide start_ARG italic_S end_ARG start_ARG italic_r end_ARG = divide start_ARG 1 end_ARG start_ARG ( 1 - italic_r ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG, which means S=r(1r)2S=\frac{r}{(1-r)^{2}}italic_S = divide start_ARG italic_r end_ARG start_ARG ( 1 - italic_r ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG.
因此, Sr=1(1r)21superscript12\frac{S}{r}=\frac{1}{(1-r)^{2}}divide start_ARG italic_S end_ARG start_ARG italic_r end_ARG = divide start_ARG 1 end_ARG start_ARG ( 1 - italic_r ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,即表示 S=r(1r)2superscript12S=\frac{r}{(1-r)^{2}}italic_S = divide start_ARG italic_r end_ARG start_ARG ( 1 - italic_r ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
In our case, r=14r=\frac{1}{4}italic_r = divide start_ARG 1 end_ARG start_ARG 4 end_ARG.
在我们的情况下, r=1414r=\frac{1}{4}italic_r = divide start_ARG 1 end_ARG start_ARG 4 end_ARG
x=1x(14)x=14(114)2=14(34)2=14916=14×169=49\sum_{x=1}^{\infty}x(\frac{1}{4})^{x}=\frac{\frac{1}{4}}{(1-\frac{1}{4})^{2}}=% \frac{\frac{1}{4}}{(\frac{3}{4})^{2}}=\frac{\frac{1}{4}}{\frac{9}{16}}=\frac{1% }{4}\times\frac{16}{9}=\frac{4}{9}∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = divide start_ARG divide start_ARG 1 end_ARG start_ARG 4 end_ARG end_ARG start_ARG ( 1 - divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG divide start_ARG 1 end_ARG start_ARG 4 end_ARG end_ARG start_ARG ( divide start_ARG 3 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG divide start_ARG 1 end_ARG start_ARG 4 end_ARG end_ARG start_ARG divide start_ARG 9 end_ARG start_ARG 16 end_ARG end_ARG = divide start_ARG 1 end_ARG start_ARG 4 end_ARG × divide start_ARG 16 end_ARG start_ARG 9 end_ARG = divide start_ARG 4 end_ARG start_ARG 9 end_ARG. Therefore, E[X|Y>X]=3×49=129=43E[X|Y>X]=3\times\frac{4}{9}=\frac{12}{9}=\frac{4}{3}italic_E [ italic_X | italic_Y > italic_X ] = 3 × divide start_ARG 4 end_ARG start_ARG 9 end_ARG = divide start_ARG 12 end_ARG start_ARG 9 end_ARG = divide start_ARG 4 end_ARG start_ARG 3 end_ARG.  因此, E[X|Y>X]=3×49=129=43delimited-[]ket34912943E[X|Y>X]=3\times\frac{4}{9}=\frac{12}{9}=\frac{4}{3}italic_E [ italic_X | italic_Y > italic_X ] = 3 × divide start_ARG 4 end_ARG start_ARG 9 end_ARG = divide start_ARG 12 end_ARG start_ARG 9 end_ARG = divide start_ARG 4 end_ARG start_ARG 3 end_ARG Alternative approach: Symmetry.
替代方法:对称性。
Consider the events Y>XY>Xitalic_Y > italic_X, X>YX>Yitalic_X > italic_Y, and X=YX=Yitalic_X = italic_Y.
考虑事件 Y>XY>Xitalic_Y > italic_XX>YX>Yitalic_X > italic_YX=YX=Yitalic_X = italic_Y
P(Y>X)=P(X>Y)P(Y>X)=P(X>Y)italic_P ( italic_Y > italic_X ) = italic_P ( italic_X > italic_Y ) due to symmetry.  由于对称性。 P(X=Y)=k=1P(X=k,Y=k)=k=1P(X=k)P(Y=k)=k=1(12k)2=k=1(14)k=14114=1434=13P(X=Y)=\sum_{k=1}^{\infty}P(X=k,Y=k)=\sum_{k=1}^{\infty}P(X=k)P(Y=k)=\sum_{k=1% }^{\infty}(\frac{1}{2^{k}})^{2}=\sum_{k=1}^{\infty}(\frac{1}{4})^{k}=\frac{% \frac{1}{4}}{1-\frac{1}{4}}=\frac{\frac{1}{4}}{\frac{3}{4}}=\frac{1}{3}italic_P ( italic_X = italic_Y ) = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_P ( italic_X = italic_k , italic_Y = italic_k ) = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_P ( italic_X = italic_k ) italic_P ( italic_Y = italic_k ) = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = divide start_ARG divide start_ARG 1 end_ARG start_ARG 4 end_ARG end_ARG start_ARG 1 - divide start_ARG 1 end_ARG start_ARG 4 end_ARG end_ARG = divide start_ARG divide start_ARG 1 end_ARG start_ARG 4 end_ARG end_ARG start_ARG divide start_ARG 3 end_ARG start_ARG 4 end_ARG end_ARG = divide start_ARG 1 end_ARG start_ARG 3 end_ARG. We have P(Y>X)+P(X>Y)+P(X=Y)=1P(Y>X)+P(X>Y)+P(X=Y)=1italic_P ( italic_Y > italic_X ) + italic_P ( italic_X > italic_Y ) + italic_P ( italic_X = italic_Y ) = 1.  我们已有 P(Y>X)+P(X>Y)+P(X=Y)=11P(Y>X)+P(X>Y)+P(X=Y)=1italic_P ( italic_Y > italic_X ) + italic_P ( italic_X > italic_Y ) + italic_P ( italic_X = italic_Y ) = 1 P(Y>X)+P(Y>X)+13=1P(Y>X)+P(Y>X)+\frac{1}{3}=1italic_P ( italic_Y > italic_X ) + italic_P ( italic_Y > italic_X ) + divide start_ARG 1 end_ARG start_ARG 3 end_ARG = 1. 2P(Y>X)=113=232P(Y>X)=1-\frac{1}{3}=\frac{2}{3}2 italic_P ( italic_Y > italic_X ) = 1 - divide start_ARG 1 end_ARG start_ARG 3 end_ARG = divide start_ARG 2 end_ARG start_ARG 3 end_ARG. P(Y>X)=13P(Y>X)=\frac{1}{3}italic_P ( italic_Y > italic_X ) = divide start_ARG 1 end_ARG start_ARG 3 end_ARG. This confirms our previous calculation.
这证实了我们的先前计算。
Let E[X|Y>X]=μE[X|Y>X]=\muitalic_E [ italic_X | italic_Y > italic_X ] = italic_μ. We are looking for the expected number of flips Alice made, given Bob flipped more times.
我们正在寻找在鲍勃翻动次数更多的情况下,爱丽丝预期的翻动次数。
Consider the case where the first flip for both is tails. Then the process restarts, and the condition remains.
考虑这种情况,两人的第一次抛掷都是反面。然后过程重新开始,条件保持不变。
Let A1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT be Alice’s first flip, B1B_{1}italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT be Bob’s first flip.
A1subscript1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 为 Alice 的第一次翻转, B1subscript1B_{1}italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 为 Bob 的第一次翻转。
P(A1=H)=1/2,P(A1=T)=1/2P(A_{1}=H)=1/2,P(A_{1}=T)=1/2italic_P ( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_H ) = 1 / 2 , italic_P ( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_T ) = 1 / 2. P(B1=H)=1/2,P(B1=T)=1/2P(B_{1}=H)=1/2,P(B_{1}=T)=1/2italic_P ( italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_H ) = 1 / 2 , italic_P ( italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_T ) = 1 / 2. Case 1: A1=H,B1=HA_{1}=H,B_{1}=Hitalic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_H , italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_H. X=1,Y=1X=1,Y=1italic_X = 1 , italic_Y = 1, so Y>XY>Xitalic_Y > italic_X is false.
案例 1: A1=H,B1=Hformulae-sequencesubscript1subscript1A_{1}=H,B_{1}=Hitalic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_H , italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_H . X=1,Y=1formulae-sequence11X=1,Y=1italic_X = 1 , italic_Y = 1 ,因此 Y>XY>Xitalic_Y > italic_X 为假。
Case 2: A1=H,B1=TA_{1}=H,B_{1}=Titalic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_H , italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_T. Alice stops at 1, Bob continues. Y>1Y>1italic_Y > 1.
案例 2: A1=H,B1=Tformulae-sequencesubscript1subscript1A_{1}=H,B_{1}=Titalic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_H , italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_T . 爱丽丝停在 1,鲍勃继续。 Y>11Y>1italic_Y > 1 .
Case 3: A1=T,B1=HA_{1}=T,B_{1}=Hitalic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_T , italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_H. Bob stops at 1, Alice continues.
案例 3: A1=T,B1=Hformulae-sequencesubscript1subscript1A_{1}=T,B_{1}=Hitalic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_T , italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_H . 鲍勃停在 1,爱丽丝继续。
Case 4: A1=T,B1=TA_{1}=T,B_{1}=Titalic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_T , italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_T. Both continue.
案例 4: A1=T,B1=Tformulae-sequencesubscript1subscript1A_{1}=T,B_{1}=Titalic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_T , italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_T 。两者均继续。
Consider the conditional probabilities given Y>XY>Xitalic_Y > italic_X.
考虑给定的条件概率 Y>XY>Xitalic_Y > italic_X
Let E=E[X|Y>X]E=E[X|Y>X]italic_E = italic_E [ italic_X | italic_Y > italic_X ]. If the first flips are HH, the condition Y>XY>Xitalic_Y > italic_X is not met.
如果第一次翻转是 HH,则不满足条件 Y>XY>Xitalic_Y > italic_X
If the first flips are HT, Alice’s number of flips is 1.
如果第一次翻转是 HT,Alice 的翻转次数为 1。
If the first flips are TH, Bob’s number of flips is 1, Alice’s is at least 2, so Y<XY<Xitalic_Y < italic_X.
如果第一次翻转是 TH,Bob 的翻转次数是 1,Alice 的至少是 2,所以 Y<XY<Xitalic_Y < italic_X
If the first flips are TT, the problem restarts, and the expected number of additional flips for Alice is still EEitalic_E.
如果第一次翻转是 TT,问题将重新开始,Alice 期望的额外翻转次数仍然是 EEitalic_E
Let NAN_{A}italic_N start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT be the number of flips Alice makes, NBN_{B}italic_N start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT be the number of flips Bob makes.
NAsubscriptN_{A}italic_N start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT 为 Alice 翻转的次数, NBsubscriptN_{B}italic_N start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT 为 Bob 翻转的次数。
We are given NB>NAN_{B}>N_{A}italic_N start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT > italic_N start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT.
我们得到了 NB>NAsubscriptsubscriptN_{B}>N_{A}italic_N start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT > italic_N start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT
Consider the first flips:
考虑第一次翻转:
HH (prob 1/4): Condition not met.
HH(概率 1/4):条件未满足。
HT (prob 1/4): Alice flips 1 time.
HT(概率 1/4):Alice 翻转 1 次。
TH (prob 1/4): Bob flips 1 time, Alice ¿ 1, so NA>NBN_{A}>N_{B}italic_N start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT > italic_N start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT, condition not met.
TH(概率 1/4):Bob 翻转 1 次,Alice 翻转 1 次,所以 NA>NBsubscriptsubscriptN_{A}>N_{B}italic_N start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT > italic_N start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ,条件未满足。
TT (prob 1/4): Both continue.
TT(概率 1/4):两者继续。
Let EEitalic_E be the expected number of Alice’s flips given Bob flipped more.
EEitalic_E 为当 Bob 翻转次数更多时,Alice 预期翻转的次数。
If the first flips are HT, Alice flipped 1 time, and the condition is met.
如果第一次翻转是 HT,Alice 翻转了 1 次,条件得到满足。
If the first flips are TT, the situation is the same as the beginning, so the expected number of additional flips for Alice is EEitalic_E.
如果第一次翻转是 TT,情况与开始时相同,因此 Alice 额外翻转的期望次数为 EEitalic_E
Consider the event Y>XY>Xitalic_Y > italic_X.
考虑事件 Y>XY>Xitalic_Y > italic_X
Conditioning on the first flips:
基于第一次翻转:
E[X|Y>X]=E[X|Y>X,HH]P(HH|Y>X)+E[X|Y>X,HT]P(HT|Y>X)+E[X|Y>X,TH]P(TH|Y>X)+E[X|Y>X,TT]P(TT|Y>X)E[X|Y>X]=E[X|Y>X,HH]P(HH|Y>X)+E[X|Y>X,HT]P(HT|Y>X)+E[X|Y>X,TH]P(TH|Y>X)+E[X|Y>% X,TT]P(TT|Y>X)italic_E [ italic_X | italic_Y > italic_X ] = italic_E [ italic_X | italic_Y > italic_X , italic_H italic_H ] italic_P ( italic_H italic_H | italic_Y > italic_X ) + italic_E [ italic_X | italic_Y > italic_X , italic_H italic_T ] italic_P ( italic_H italic_T | italic_Y > italic_X ) + italic_E [ italic_X | italic_Y > italic_X , italic_T italic_H ] italic_P ( italic_T italic_H | italic_Y > italic_X ) + italic_E [ italic_X | italic_Y > italic_X , italic_T italic_T ] italic_P ( italic_T italic_T | italic_Y > italic_X ) P(Y>X|HH)=0P(Y>X|HH)=0italic_P ( italic_Y > italic_X | italic_H italic_H ) = 0. P(Y>X|HT)=1P(Y>X|HT)=1italic_P ( italic_Y > italic_X | italic_H italic_T ) = 1. P(Y>X|TH)=0P(Y>X|TH)=0italic_P ( italic_Y > italic_X | italic_T italic_H ) = 0. P(Y>X|TT)P(Y>X|TT)italic_P ( italic_Y > italic_X | italic_T italic_T ) depends on the future flips.
P(Y>X|TT)conditionalP(Y>X|TT)italic_P ( italic_Y > italic_X | italic_T italic_T ) 取决于未来的翻转。
Let’s use the definition of conditional expectation directly.
让我们直接使用条件期望的定义。
E[X|Y>X]=x=1xP(X=x|Y>X)=x=1xP(X=x,Y>x)P(Y>X)E[X|Y>X]=\sum_{x=1}^{\infty}xP(X=x|Y>X)=\sum_{x=1}^{\infty}x\frac{P(X=x,Y>x)}{% P(Y>X)}italic_E [ italic_X | italic_Y > italic_X ] = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x italic_P ( italic_X = italic_x | italic_Y > italic_X ) = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x divide start_ARG italic_P ( italic_X = italic_x , italic_Y > italic_x ) end_ARG start_ARG italic_P ( italic_Y > italic_X ) end_ARG Consider the probabilities of the first flips given Y>XY>Xitalic_Y > italic_X.
考虑给定 Y>XY>Xitalic_Y > italic_X 的首次翻转的概率。
P(HH|Y>X)=P(Y>X|HH)P(HH)P(Y>X)=01/41/3=0P(HH|Y>X)=\frac{P(Y>X|HH)P(HH)}{P(Y>X)}=\frac{0\cdot 1/4}{1/3}=0italic_P ( italic_H italic_H | italic_Y > italic_X ) = divide start_ARG italic_P ( italic_Y > italic_X | italic_H italic_H ) italic_P ( italic_H italic_H ) end_ARG start_ARG italic_P ( italic_Y > italic_X ) end_ARG = divide start_ARG 0 ⋅ 1 / 4 end_ARG start_ARG 1 / 3 end_ARG = 0. P(HT|Y>X)=P(Y>X|HT)P(HT)P(Y>X)=11/41/3=34P(HT|Y>X)=\frac{P(Y>X|HT)P(HT)}{P(Y>X)}=\frac{1\cdot 1/4}{1/3}=\frac{3}{4}italic_P ( italic_H italic_T | italic_Y > italic_X ) = divide start_ARG italic_P ( italic_Y > italic_X | italic_H italic_T ) italic_P ( italic_H italic_T ) end_ARG start_ARG italic_P ( italic_Y > italic_X ) end_ARG = divide start_ARG 1 ⋅ 1 / 4 end_ARG start_ARG 1 / 3 end_ARG = divide start_ARG 3 end_ARG start_ARG 4 end_ARG.
P(TH|Y>X)=P(Y>X|TH)P(TH)P(Y>X)=01/41/3=0P(TH|Y>X)=\frac{P(Y>X|TH)P(TH)}{P(Y>X)}=\frac{0\cdot 1/4}{1/3}=0italic_P ( italic_T italic_H | italic_Y > italic_X ) = divide start_ARG italic_P ( italic_Y > italic_X | italic_T italic_H ) italic_P ( italic_T italic_H ) end_ARG start_ARG italic_P ( italic_Y > italic_X ) end_ARG = divide start_ARG 0 ⋅ 1 / 4 end_ARG start_ARG 1 / 3 end_ARG = 0. P(TT|Y>X)=P(Y>X|TT)P(TT)P(Y>X)P(TT|Y>X)=\frac{P(Y>X|TT)P(TT)}{P(Y>X)}italic_P ( italic_T italic_T | italic_Y > italic_X ) = divide start_ARG italic_P ( italic_Y > italic_X | italic_T italic_T ) italic_P ( italic_T italic_T ) end_ARG start_ARG italic_P ( italic_Y > italic_X ) end_ARG If the first flips are TT, the future behavior is the same, and the condition Y>XY>Xitalic_Y > italic_X holds if Bob takes more flips than Alice in the subsequent trials.
如果第一次翻转是 TT,未来的行为相同,如果鲍勃在随后的试验中比爱丽丝翻转次数多,则条件 Y>XY>Xitalic_Y > italic_X 成立。
P(Y>X|TT)=P(Bob needs more flips than Alice starting from scratch)=P(Y>X)=1/3P(Y>X|TT)=P(\text{Bob needs more flips than Alice starting from scratch})=P(Y>% X)=1/3italic_P ( italic_Y > italic_X | italic_T italic_T ) = italic_P ( Bob needs more flips than Alice starting from scratch ) = italic_P ( italic_Y > italic_X ) = 1 / 3. P(TT|Y>X)=(1/3)(1/4)1/3=14P(TT|Y>X)=\frac{(1/3)\cdot(1/4)}{1/3}=\frac{1}{4}italic_P ( italic_T italic_T | italic_Y > italic_X ) = divide start_ARG ( 1 / 3 ) ⋅ ( 1 / 4 ) end_ARG start_ARG 1 / 3 end_ARG = divide start_ARG 1 end_ARG start_ARG 4 end_ARG. Sum of conditional probabilities: 0+3/4+0+1/4=10+3/4+0+1/4=10 + 3 / 4 + 0 + 1 / 4 = 1.
条件概率之和: 0+3/4+0+1/4=103401410+3/4+0+1/4=10 + 3 / 4 + 0 + 1 / 4 = 1
Now consider the conditional expectations of XXitalic_X:
现在考虑条件期望 XXitalic_X
E[X|Y>X,HH]E[X|Y>X,HH]italic_E [ italic_X | italic_Y > italic_X , italic_H italic_H ]: Not relevant since P(HH|Y>X)=0P(HH|Y>X)=0italic_P ( italic_H italic_H | italic_Y > italic_X ) = 0.
E[X|Y>X,HH]ketE[X|Y>X,HH]italic_E [ italic_X | italic_Y > italic_X , italic_H italic_H ] : 由于 P(HH|Y>X)=0ket0P(HH|Y>X)=0italic_P ( italic_H italic_H | italic_Y > italic_X ) = 0 ,不相关。
E[X|Y>X,HT]E[X|Y>X,HT]italic_E [ italic_X | italic_Y > italic_X , italic_H italic_T ]: Given HT and Y>XY>Xitalic_Y > italic_X, Alice flipped 1 time. E=1E=1italic_E = 1.
给定 HT 和 Y>XY>Xitalic_Y > italic_X ,Alice 翻转了 1 次。 E=11E=1italic_E = 1
E[X|Y>X,TH]E[X|Y>X,TH]italic_E [ italic_X | italic_Y > italic_X , italic_T italic_H ]: Not relevant.   E[X|Y>X,TH]ketE[X|Y>X,TH]italic_E [ italic_X | italic_Y > italic_X , italic_T italic_H ] : 不相关。 E[X|Y>X,TT]E[X|Y>X,TT]italic_E [ italic_X | italic_Y > italic_X , italic_T italic_T ]: Given TT and Y>XY>Xitalic_Y > italic_X, Alice has already flipped once, and the expected number of additional flips is EEitalic_E. So, the total number of flips is 1+E1+E1 + italic_E.
给定 TT 和 Y>XY>Xitalic_Y > italic_X ,Alice 已经翻转了一次,期望额外翻转的次数是 EEitalic_E 。因此,总翻转次数是 1+E11+E1 + italic_E
E=00+1(3/4)+undefined0+(1+E)(1/4)E=0\cdot 0+1\cdot(3/4)+\text{undefined}\cdot 0+(1+E)\cdot(1/4)italic_E = 0 ⋅ 0 + 1 ⋅ ( 3 / 4 ) + undefined ⋅ 0 + ( 1 + italic_E ) ⋅ ( 1 / 4 ) E=0+3/4+0+1/4+E/4E=0+3/4+0+1/4+E/4italic_E = 0 + 3 / 4 + 0 + 1 / 4 + italic_E / 4 E=1+E/4E=1+E/4italic_E = 1 + italic_E / 4 EE/4=1E-E/4=1italic_E - italic_E / 4 = 1 3E/4=13E/4=13 italic_E / 4 = 1 E=4/3E=4/3italic_E = 4 / 3. This confirms the previous result.
这证实了之前的结果。
Let’s think about the intuition. If we have the condition Y>XY>Xitalic_Y > italic_X, it means Alice was ”lucky” and got a head early, or Bob was ”unlucky”.
让我们思考一下直觉。如果我们有条件 Y>XY>Xitalic_Y > italic_X ,这意味着 Alice“幸运”地早早得到了正面,或者 Bob“不幸”。
Consider the random variable Z=YXZ=Y-Xitalic_Z = italic_Y - italic_X. We are given Z>0Z>0italic_Z > 0.
考虑随机变量 Z=YXZ=Y-Xitalic_Z = italic_Y - italic_X 。我们已知 Z>00Z>0italic_Z > 0
We want E[X|Z>0]E[X|Z>0]italic_E [ italic_X | italic_Z > 0 ].  我们希望 E[X|Z>0]delimited-[]ket0E[X|Z>0]italic_E [ italic_X | italic_Z > 0 ] Let p=P(head)=0.5p=P(\text{head})=0.5italic_p = italic_P ( head ) = 0.5. The number of tails before the first head follows a geometric distribution with parameter ppitalic_p.
第一次出现正面之前尾巴的数量遵循参数为 ppitalic_p 的几何分布。
X1Geom(p)X-1\sim Geom(p)italic_X - 1 ∼ italic_G italic_e italic_o italic_m ( italic_p ), Y1Geom(p)Y-1\sim Geom(p)italic_Y - 1 ∼ italic_G italic_e italic_o italic_m ( italic_p ). P(X1=k)=(1p)kpP(X-1=k)=(1-p)^{k}pitalic_P ( italic_X - 1 = italic_k ) = ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_p. P(X=x)=(1p)x1pP(X=x)=(1-p)^{x-1}pitalic_P ( italic_X = italic_x ) = ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x - 1 end_POSTSUPERSCRIPT italic_p. Consider the joint distribution of XXitalic_X and YYitalic_Y.
考虑 XXitalic_XYYitalic_Y 的联合分布。
P(X=x,Y=y)=p2(1p)x1(1p)y1P(X=x,Y=y)=p^{2}(1-p)^{x-1}(1-p)^{y-1}italic_P ( italic_X = italic_x , italic_Y = italic_y ) = italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x - 1 end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_y - 1 end_POSTSUPERSCRIPT Consider the event Y=X+kY=X+kitalic_Y = italic_X + italic_k, where k1k\geq 1italic_k ≥ 1.
考虑事件 Y=X+kY=X+kitalic_Y = italic_X + italic_k ,其中 k11k\geq 1italic_k ≥ 1
P(Y=X+k)=x=1P(X=x,Y=x+k)=x=1p2(1p)x1(1p)x+k1P(Y=X+k)=\sum_{x=1}^{\infty}P(X=x,Y=x+k)=\sum_{x=1}^{\infty}p^{2}(1-p)^{x-1}(1% -p)^{x+k-1}italic_P ( italic_Y = italic_X + italic_k ) = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_P ( italic_X = italic_x , italic_Y = italic_x + italic_k ) = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x - 1 end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x + italic_k - 1 end_POSTSUPERSCRIPT =p2(1p)k1x=1((1p)2)x1=p2(1p)k111(1p)2=p^{2}(1-p)^{k-1}\sum_{x=1}^{\infty}((1-p)^{2})^{x-1}=p^{2}(1-p)^{k-1}\frac{1}% {1-(1-p)^{2}}= italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_x - 1 end_POSTSUPERSCRIPT = italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 - ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG =p2(1p)k111(12p+p2)=p2(1p)k112pp2=p2(1p)k11p(2p)=p^{2}(1-p)^{k-1}\frac{1}{1-(1-2p+p^{2})}=p^{2}(1-p)^{k-1}\frac{1}{2p-p^{2}}=p% ^{2}(1-p)^{k-1}\frac{1}{p(2-p)}= italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 - ( 1 - 2 italic_p + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG = italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 italic_p - italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_p ( 2 - italic_p ) end_ARG =p(1p)k12p=\frac{p(1-p)^{k-1}}{2-p}= divide start_ARG italic_p ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT end_ARG start_ARG 2 - italic_p end_ARG. If p=0.5p=0.5italic_p = 0.5, then 1p=0.51-p=0.51 - italic_p = 0.5, 2p=1.52-p=1.52 - italic_p = 1.5.
如果 p=0.50.5p=0.5italic_p = 0.5 ,则 1p=0.510.51-p=0.51 - italic_p = 0.52p=1.521.52-p=1.52 - italic_p = 1.5
P(Y=X+k)=0.5(0.5)k11.5=(0.5)k1.5=(1/2)k3/2=12k23=132k1P(Y=X+k)=\frac{0.5(0.5)^{k-1}}{1.5}=\frac{(0.5)^{k}}{1.5}=\frac{(1/2)^{k}}{3/2% }=\frac{1}{2^{k}}\cdot\frac{2}{3}=\frac{1}{3\cdot 2^{k-1}}italic_P ( italic_Y = italic_X + italic_k ) = divide start_ARG 0.5 ( 0.5 ) start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT end_ARG start_ARG 1.5 end_ARG = divide start_ARG ( 0.5 ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG start_ARG 1.5 end_ARG = divide start_ARG ( 1 / 2 ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG start_ARG 3 / 2 end_ARG = divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG ⋅ divide start_ARG 2 end_ARG start_ARG 3 end_ARG = divide start_ARG 1 end_ARG start_ARG 3 ⋅ 2 start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT end_ARG. Check if k=1P(Y=X+k)=P(Y>X)=1/3\sum_{k=1}^{\infty}P(Y=X+k)=P(Y>X)=1/3∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_P ( italic_Y = italic_X + italic_k ) = italic_P ( italic_Y > italic_X ) = 1 / 3.  检查 k=1P(Y=X+k)=P(Y>X)=1/3superscriptsubscript113\sum_{k=1}^{\infty}P(Y=X+k)=P(Y>X)=1/3∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_P ( italic_Y = italic_X + italic_k ) = italic_P ( italic_Y > italic_X ) = 1 / 3 k=1132k1=13k=1(12)k1=13(1+1/2+1/4+)=13111/2=132=23\sum_{k=1}^{\infty}\frac{1}{3\cdot 2^{k-1}}=\frac{1}{3}\sum_{k=1}^{\infty}(% \frac{1}{2})^{k-1}=\frac{1}{3}(1+1/2+1/4+\dots)=\frac{1}{3}\frac{1}{1-1/2}=% \frac{1}{3}\cdot 2=\frac{2}{3}∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 3 ⋅ 2 start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT end_ARG = divide start_ARG 1 end_ARG start_ARG 3 end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG 3 end_ARG ( 1 + 1 / 2 + 1 / 4 + … ) = divide start_ARG 1 end_ARG start_ARG 3 end_ARG divide start_ARG 1 end_ARG start_ARG 1 - 1 / 2 end_ARG = divide start_ARG 1 end_ARG start_ARG 3 end_ARG ⋅ 2 = divide start_ARG 2 end_ARG start_ARG 3 end_ARG. Something is wrong here.  这里有问题。 Let’s recalculate P(Y>X)P(Y>X)italic_P ( italic_Y > italic_X ).  让我们重新计算 P(Y>X)P(Y>X)italic_P ( italic_Y > italic_X ) P(Y>X)=x=1y=x+1(1p)x1p(1p)y1pP(Y>X)=\sum_{x=1}^{\infty}\sum_{y=x+1}^{\infty}(1-p)^{x-1}p(1-p)^{y-1}pitalic_P ( italic_Y > italic_X ) = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_y = italic_x + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x - 1 end_POSTSUPERSCRIPT italic_p ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_y - 1 end_POSTSUPERSCRIPT italic_p =p2x=1(1p)x1y=x+1(1p)y1=p^{2}\sum_{x=1}^{\infty}(1-p)^{x-1}\sum_{y=x+1}^{\infty}(1-p)^{y-1}= italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_y = italic_x + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_y - 1 end_POSTSUPERSCRIPT Inner sum: (1p)x+(1p)x+1+=(1p)x1(1p)=(1p)xp(1-p)^{x}+(1-p)^{x+1}+\dots=\frac{(1-p)^{x}}{1-(1-p)}=\frac{(1-p)^{x}}{p}( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x + 1 end_POSTSUPERSCRIPT + ⋯ = divide start_ARG ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ( 1 - italic_p ) end_ARG = divide start_ARG ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG start_ARG italic_p end_ARG.  内和: (1p)x+(1p)x+1+=(1p)x1(1p)=(1p)xpsuperscript1superscript11superscript111superscript1(1-p)^{x}+(1-p)^{x+1}+\dots=\frac{(1-p)^{x}}{1-(1-p)}=\frac{(1-p)^{x}}{p}( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x + 1 end_POSTSUPERSCRIPT + ⋯ = divide start_ARG ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ( 1 - italic_p ) end_ARG = divide start_ARG ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG start_ARG italic_p end_ARG P(Y>X)=p2x=1(1p)x1(1p)xp=p(1p)x=1(1p)2(x1)P(Y>X)=p^{2}\sum_{x=1}^{\infty}(1-p)^{x-1}\frac{(1-p)^{x}}{p}=p(1-p)\sum_{x=1}% ^{\infty}(1-p)^{2(x-1)}italic_P ( italic_Y > italic_X ) = italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x - 1 end_POSTSUPERSCRIPT divide start_ARG ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG start_ARG italic_p end_ARG = italic_p ( 1 - italic_p ) ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 ( italic_x - 1 ) end_POSTSUPERSCRIPT =p(1p)x=1((1p)2)x1=p(1p)11(1p)2=p(1-p)\sum_{x=1}^{\infty}((1-p)^{2})^{x-1}=p(1-p)\frac{1}{1-(1-p)^{2}}= italic_p ( 1 - italic_p ) ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_x - 1 end_POSTSUPERSCRIPT = italic_p ( 1 - italic_p ) divide start_ARG 1 end_ARG start_ARG 1 - ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG =p(1p)11(12p+p2)=p(1p)12pp2=p(1p)1p(2p)=1p2p=p(1-p)\frac{1}{1-(1-2p+p^{2})}=p(1-p)\frac{1}{2p-p^{2}}=p(1-p)\frac{1}{p(2-p)% }=\frac{1-p}{2-p}= italic_p ( 1 - italic_p ) divide start_ARG 1 end_ARG start_ARG 1 - ( 1 - 2 italic_p + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG = italic_p ( 1 - italic_p ) divide start_ARG 1 end_ARG start_ARG 2 italic_p - italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = italic_p ( 1 - italic_p ) divide start_ARG 1 end_ARG start_ARG italic_p ( 2 - italic_p ) end_ARG = divide start_ARG 1 - italic_p end_ARG start_ARG 2 - italic_p end_ARG. If p=0.5p=0.5italic_p = 0.5, P(Y>X)=0.51.5=13P(Y>X)=\frac{0.5}{1.5}=\frac{1}{3}italic_P ( italic_Y > italic_X ) = divide start_ARG 0.5 end_ARG start_ARG 1.5 end_ARG = divide start_ARG 1 end_ARG start_ARG 3 end_ARG. Correct.
如果 p=0.50.5p=0.5italic_p = 0.5P(Y>X)=0.51.5=130.51.513P(Y>X)=\frac{0.5}{1.5}=\frac{1}{3}italic_P ( italic_Y > italic_X ) = divide start_ARG 0.5 end_ARG start_ARG 1.5 end_ARG = divide start_ARG 1 end_ARG start_ARG 3 end_ARG 。正确。
Now, let’s find E[X|Y>X]E[X|Y>X]italic_E [ italic_X | italic_Y > italic_X ].
现在,让我们找到 E[X|Y>X]delimited-[]ketE[X|Y>X]italic_E [ italic_X | italic_Y > italic_X ]
E[X|Y>X]=x=1xP(X=x|Y>X)=x=1xP(X=x,Y>x)P(Y>X)E[X|Y>X]=\sum_{x=1}^{\infty}xP(X=x|Y>X)=\sum_{x=1}^{\infty}x\frac{P(X=x,Y>x)}{% P(Y>X)}italic_E [ italic_X | italic_Y > italic_X ] = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x italic_P ( italic_X = italic_x | italic_Y > italic_X ) = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x divide start_ARG italic_P ( italic_X = italic_x , italic_Y > italic_x ) end_ARG start_ARG italic_P ( italic_Y > italic_X ) end_ARG. P(X=x,Y>x)=P(X=x)P(Y>x)P(X=x,Y>x)=P(X=x)P(Y>x)italic_P ( italic_X = italic_x , italic_Y > italic_x ) = italic_P ( italic_X = italic_x ) italic_P ( italic_Y > italic_x ) (independence)   P(X=x,Y>x)=P(X=x)P(Y>x)formulae-sequenceP(X=x,Y>x)=P(X=x)P(Y>x)italic_P ( italic_X = italic_x , italic_Y > italic_x ) = italic_P ( italic_X = italic_x ) italic_P ( italic_Y > italic_x ) (独立) P(X=x)=(1p)x1pP(X=x)=(1-p)^{x-1}pitalic_P ( italic_X = italic_x ) = ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x - 1 end_POSTSUPERSCRIPT italic_p. P(Y>x)=y=x+1(1p)y1p=p[(1p)x+(1p)x+1+]=p(1p)x1(1p)=(1p)xP(Y>x)=\sum_{y=x+1}^{\infty}(1-p)^{y-1}p=p[(1-p)^{x}+(1-p)^{x+1}+\dots]=p\frac% {(1-p)^{x}}{1-(1-p)}=(1-p)^{x}italic_P ( italic_Y > italic_x ) = ∑ start_POSTSUBSCRIPT italic_y = italic_x + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_y - 1 end_POSTSUPERSCRIPT italic_p = italic_p [ ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x + 1 end_POSTSUPERSCRIPT + … ] = italic_p divide start_ARG ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ( 1 - italic_p ) end_ARG = ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT. P(X=x,Y>x)=(1p)x1p(1p)x=p(1p)2x1P(X=x,Y>x)=(1-p)^{x-1}p(1-p)^{x}=p(1-p)^{2x-1}italic_P ( italic_X = italic_x , italic_Y > italic_x ) = ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x - 1 end_POSTSUPERSCRIPT italic_p ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = italic_p ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 italic_x - 1 end_POSTSUPERSCRIPT. E[X|Y>X]=1P(Y>X)x=1xp(1p)2x1E[X|Y>X]=\frac{1}{P(Y>X)}\sum_{x=1}^{\infty}xp(1-p)^{2x-1}italic_E [ italic_X | italic_Y > italic_X ] = divide start_ARG 1 end_ARG start_ARG italic_P ( italic_Y > italic_X ) end_ARG ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x italic_p ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 italic_x - 1 end_POSTSUPERSCRIPT =2p1pp(1p)1x=1x((1p)2)x=\frac{2-p}{1-p}p(1-p)^{-1}\sum_{x=1}^{\infty}x((1-p)^{2})^{x}= divide start_ARG 2 - italic_p end_ARG start_ARG 1 - italic_p end_ARG italic_p ( 1 - italic_p ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x ( ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT (error in power)   =2p1pp(1p)1x=1x((1p)2)xabsent21superscript11superscriptsubscript1superscriptsuperscript12=\frac{2-p}{1-p}p(1-p)^{-1}\sum_{x=1}^{\infty}x((1-p)^{2})^{x}= divide start_ARG 2 - italic_p end_ARG start_ARG 1 - italic_p end_ARG italic_p ( 1 - italic_p ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x ( ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT (功率错误) P(Y>x)=(1p)x1(1p)p=(1p)xP(Y>x)=\frac{(1-p)^{x}}{1-(1-p)}p=(1-p)^{x}italic_P ( italic_Y > italic_x ) = divide start_ARG ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ( 1 - italic_p ) end_ARG italic_p = ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT. E[X|Y>X]=1P(Y>X)x=1xP(X=x)P(Y>x)E[X|Y>X]=\frac{1}{P(Y>X)}\sum_{x=1}^{\infty}xP(X=x)P(Y>x)italic_E [ italic_X | italic_Y > italic_X ] = divide start_ARG 1 end_ARG start_ARG italic_P ( italic_Y > italic_X ) end_ARG ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x italic_P ( italic_X = italic_x ) italic_P ( italic_Y > italic_x ) =1P(Y>X)x=1x(1p)x1p(1p)x=\frac{1}{P(Y>X)}\sum_{x=1}^{\infty}x(1-p)^{x-1}p(1-p)^{x}= divide start_ARG 1 end_ARG start_ARG italic_P ( italic_Y > italic_X ) end_ARG ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x - 1 end_POSTSUPERSCRIPT italic_p ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT =pP(Y>X)x=1x(1p)2x1=\frac{p}{P(Y>X)}\sum_{x=1}^{\infty}x(1-p)^{2x-1}= divide start_ARG italic_p end_ARG start_ARG italic_P ( italic_Y > italic_X ) end_ARG ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 italic_x - 1 end_POSTSUPERSCRIPT =p(1p)/(2p)(1p)1x=1x((1p)2)x=\frac{p}{(1-p)/(2-p)}(1-p)^{-1}\sum_{x=1}^{\infty}x((1-p)^{2})^{x}= divide start_ARG italic_p end_ARG start_ARG ( 1 - italic_p ) / ( 2 - italic_p ) end_ARG ( 1 - italic_p ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x ( ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT =p2p(1p)2x=1x((1p)2)x=p\frac{2-p}{(1-p)^{2}}\sum_{x=1}^{\infty}x((1-p)^{2})^{x}= italic_p divide start_ARG 2 - italic_p end_ARG start_ARG ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x ( ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT. Let r=(1p)2r=(1-p)^{2}italic_r = ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. x=1xrx=r(1r)2\sum_{x=1}^{\infty}xr^{x}=\frac{r}{(1-r)^{2}}∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x italic_r start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = divide start_ARG italic_r end_ARG start_ARG ( 1 - italic_r ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG.
r=(1p)2superscript12r=(1-p)^{2}italic_r = ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . x=1xrx=r(1r)2superscriptsubscript1superscriptsuperscript12\sum_{x=1}^{\infty}xr^{x}=\frac{r}{(1-r)^{2}}∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x italic_r start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = divide start_ARG italic_r end_ARG start_ARG ( 1 - italic_r ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .
1r=1(1p)2=1(12p+p2)=2pp2=p(2p)1-r=1-(1-p)^{2}=1-(1-2p+p^{2})=2p-p^{2}=p(2-p)1 - italic_r = 1 - ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1 - ( 1 - 2 italic_p + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) = 2 italic_p - italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_p ( 2 - italic_p ). (1r)2=p2(2p)2(1-r)^{2}=p^{2}(2-p)^{2}( 1 - italic_r ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. x=1x((1p)2)x=(1p)2p2(2p)2\sum_{x=1}^{\infty}x((1-p)^{2})^{x}=\frac{(1-p)^{2}}{p^{2}(2-p)^{2}}∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x ( ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = divide start_ARG ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG. E[X|Y>X]=p2p(1p)2(1p)2p2(2p)2=1p(2p)E[X|Y>X]=p\frac{2-p}{(1-p)^{2}}\frac{(1-p)^{2}}{p^{2}(2-p)^{2}}=\frac{1}{p(2-p)}italic_E [ italic_X | italic_Y > italic_X ] = italic_p divide start_ARG 2 - italic_p end_ARG start_ARG ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG 1 end_ARG start_ARG italic_p ( 2 - italic_p ) end_ARG. If p=0.5p=0.5italic_p = 0.5, E[X|Y>X]=10.5×1.5=10.75=13/4=4/3E[X|Y>X]=\frac{1}{0.5\times 1.5}=\frac{1}{0.75}=\frac{1}{3/4}=4/3italic_E [ italic_X | italic_Y > italic_X ] = divide start_ARG 1 end_ARG start_ARG 0.5 × 1.5 end_ARG = divide start_ARG 1 end_ARG start_ARG 0.75 end_ARG = divide start_ARG 1 end_ARG start_ARG 3 / 4 end_ARG = 4 / 3.
如果 p=0.50.5p=0.5italic_p = 0.5E[X|Y>X]=10.5×1.5=10.75=13/4=4/3delimited-[]ket10.51.510.7513443E[X|Y>X]=\frac{1}{0.5\times 1.5}=\frac{1}{0.75}=\frac{1}{3/4}=4/3italic_E [ italic_X | italic_Y > italic_X ] = divide start_ARG 1 end_ARG start_ARG 0.5 × 1.5 end_ARG = divide start_ARG 1 end_ARG start_ARG 0.75 end_ARG = divide start_ARG 1 end_ARG start_ARG 3 / 4 end_ARG = 4 / 3
Solution:  解决方案: Let XXitalic_X be the number of flips Alice makes until she gets a head, and YYitalic_Y be the number of flips Bob makes until he gets a head. XXitalic_X and YYitalic_Y are independent geometric random variables with parameter p=0.5p=0.5italic_p = 0.5. The probability mass function is P(K=k)=(1p)k1pP(K=k)=(1-p)^{k-1}pitalic_P ( italic_K = italic_k ) = ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT italic_p.
XXitalic_X 为爱丽丝得到正面之前翻转的次数, YYitalic_Y 为鲍勃得到正面之前翻转的次数。 XXitalic_XYYitalic_Y 是参数为 p=0.50.5p=0.5italic_p = 0.5 的独立几何随机变量。概率质量函数为 P(K=k)=(1p)k1psuperscript11P(K=k)=(1-p)^{k-1}pitalic_P ( italic_K = italic_k ) = ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT italic_p
We are interested in the expected number of times Alice flipped her coin, given that Bob flipped his coin more times than Alice, i.e., E[X|Y>X]E[X|Y>X]italic_E [ italic_X | italic_Y > italic_X ].
我们对 Alice 抛硬币的期望次数感兴趣,在已知 Bob 抛硬币的次数比 Alice 多的情况下,即 E[X|Y>X]delimited-[]ketE[X|Y>X]italic_E [ italic_X | italic_Y > italic_X ]
Using the definition of conditional expectation, E[X|Y>X]=x=1xP(X=x|Y>X)E[X|Y>X]=\sum_{x=1}^{\infty}xP(X=x|Y>X)italic_E [ italic_X | italic_Y > italic_X ] = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x italic_P ( italic_X = italic_x | italic_Y > italic_X ).
使用条件期望的定义, E[X|Y>X]=x=1xP(X=x|Y>X)delimited-[]ketsuperscriptsubscript1ketE[X|Y>X]=\sum_{x=1}^{\infty}xP(X=x|Y>X)italic_E [ italic_X | italic_Y > italic_X ] = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x italic_P ( italic_X = italic_x | italic_Y > italic_X )
P(X=x|Y>X)=P(X=x,Y>x)P(Y>X)P(X=x|Y>X)=\frac{P(X=x,Y>x)}{P(Y>X)}italic_P ( italic_X = italic_x | italic_Y > italic_X ) = divide start_ARG italic_P ( italic_X = italic_x , italic_Y > italic_x ) end_ARG start_ARG italic_P ( italic_Y > italic_X ) end_ARG. First, calculate P(Y>X)P(Y>X)italic_P ( italic_Y > italic_X ).  首先,计算 P(Y>X)P(Y>X)italic_P ( italic_Y > italic_X ) P(Y>X)=x=1y=x+1P(X=x,Y=y)=x=1P(X=x)P(Y>x)P(Y>X)=\sum_{x=1}^{\infty}\sum_{y=x+1}^{\infty}P(X=x,Y=y)=\sum_{x=1}^{\infty}P% (X=x)P(Y>x)italic_P ( italic_Y > italic_X ) = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_y = italic_x + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_P ( italic_X = italic_x , italic_Y = italic_y ) = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_P ( italic_X = italic_x ) italic_P ( italic_Y > italic_x ). P(X=x)=(0.5)x1(0.5)=(0.5)x=12xP(X=x)=(0.5)^{x-1}(0.5)=(0.5)^{x}=\frac{1}{2^{x}}italic_P ( italic_X = italic_x ) = ( 0.5 ) start_POSTSUPERSCRIPT italic_x - 1 end_POSTSUPERSCRIPT ( 0.5 ) = ( 0.5 ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG. P(Y>x)=y=x+1(0.5)y1(0.5)=(0.5)xP(Y>x)=\sum_{y=x+1}^{\infty}(0.5)^{y-1}(0.5)=(0.5)^{x}italic_P ( italic_Y > italic_x ) = ∑ start_POSTSUBSCRIPT italic_y = italic_x + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 0.5 ) start_POSTSUPERSCRIPT italic_y - 1 end_POSTSUPERSCRIPT ( 0.5 ) = ( 0.5 ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT. P(Y>X)=x=112x(12)x=x=1(14)x=1/411/4=1/43/4=13P(Y>X)=\sum_{x=1}^{\infty}\frac{1}{2^{x}}(\frac{1}{2})^{x}=\sum_{x=1}^{\infty}% (\frac{1}{4})^{x}=\frac{1/4}{1-1/4}=\frac{1/4}{3/4}=\frac{1}{3}italic_P ( italic_Y > italic_X ) = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = divide start_ARG 1 / 4 end_ARG start_ARG 1 - 1 / 4 end_ARG = divide start_ARG 1 / 4 end_ARG start_ARG 3 / 4 end_ARG = divide start_ARG 1 end_ARG start_ARG 3 end_ARG. Next, calculate P(X=x,Y>x)=P(X=x)P(Y>x)=12x(12)x=(14)xP(X=x,Y>x)=P(X=x)P(Y>x)=\frac{1}{2^{x}}(\frac{1}{2})^{x}=(\frac{1}{4})^{x}italic_P ( italic_X = italic_x , italic_Y > italic_x ) = italic_P ( italic_X = italic_x ) italic_P ( italic_Y > italic_x ) = divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT.  接下来,计算 P(X=x,Y>x)=P(X=x)P(Y>x)=12x(12)x=(14)xformulae-sequence1superscript2superscript12superscript14P(X=x,Y>x)=P(X=x)P(Y>x)=\frac{1}{2^{x}}(\frac{1}{2})^{x}=(\frac{1}{4})^{x}italic_P ( italic_X = italic_x , italic_Y > italic_x ) = italic_P ( italic_X = italic_x ) italic_P ( italic_Y > italic_x ) = divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT Now, find P(X=x|Y>X)=P(X=x,Y>x)P(Y>X)=(1/4)x1/3=3(14)xP(X=x|Y>X)=\frac{P(X=x,Y>x)}{P(Y>X)}=\frac{(1/4)^{x}}{1/3}=3(\frac{1}{4})^{x}italic_P ( italic_X = italic_x | italic_Y > italic_X ) = divide start_ARG italic_P ( italic_X = italic_x , italic_Y > italic_x ) end_ARG start_ARG italic_P ( italic_Y > italic_X ) end_ARG = divide start_ARG ( 1 / 4 ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG start_ARG 1 / 3 end_ARG = 3 ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT.  现在,找到 P(X=x|Y>X)=P(X=x,Y>x)P(Y>X)=(1/4)x1/3=3(14)xketformulae-sequencesuperscript14133superscript14P(X=x|Y>X)=\frac{P(X=x,Y>x)}{P(Y>X)}=\frac{(1/4)^{x}}{1/3}=3(\frac{1}{4})^{x}italic_P ( italic_X = italic_x | italic_Y > italic_X ) = divide start_ARG italic_P ( italic_X = italic_x , italic_Y > italic_x ) end_ARG start_ARG italic_P ( italic_Y > italic_X ) end_ARG = divide start_ARG ( 1 / 4 ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG start_ARG 1 / 3 end_ARG = 3 ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT Finally, calculate the conditional expectation:
最后,计算条件期望:
E[X|Y>X]=x=1xP(X=x|Y>X)=x=1x3(14)x=3x=1x(14)xE[X|Y>X]=\sum_{x=1}^{\infty}xP(X=x|Y>X)=\sum_{x=1}^{\infty}x\cdot 3(\frac{1}{4% })^{x}=3\sum_{x=1}^{\infty}x(\frac{1}{4})^{x}italic_E [ italic_X | italic_Y > italic_X ] = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x italic_P ( italic_X = italic_x | italic_Y > italic_X ) = ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x ⋅ 3 ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = 3 ∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT. Using the formula x=1xrx=r(1r)2\sum_{x=1}^{\infty}xr^{x}=\frac{r}{(1-r)^{2}}∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x italic_r start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = divide start_ARG italic_r end_ARG start_ARG ( 1 - italic_r ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG with r=1/4r=1/4italic_r = 1 / 4:
使用公式 x=1xrx=r(1r)2superscriptsubscript1superscriptsuperscript12\sum_{x=1}^{\infty}xr^{x}=\frac{r}{(1-r)^{2}}∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x italic_r start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = divide start_ARG italic_r end_ARG start_ARG ( 1 - italic_r ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,其中 r=1/414r=1/4italic_r = 1 / 4
x=1x(14)x=1/4(11/4)2=1/4(3/4)2=1/49/16=14×169=49\sum_{x=1}^{\infty}x(\frac{1}{4})^{x}=\frac{1/4}{(1-1/4)^{2}}=\frac{1/4}{(3/4)% ^{2}}=\frac{1/4}{9/16}=\frac{1}{4}\times\frac{16}{9}=\frac{4}{9}∑ start_POSTSUBSCRIPT italic_x = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = divide start_ARG 1 / 4 end_ARG start_ARG ( 1 - 1 / 4 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG 1 / 4 end_ARG start_ARG ( 3 / 4 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG 1 / 4 end_ARG start_ARG 9 / 16 end_ARG = divide start_ARG 1 end_ARG start_ARG 4 end_ARG × divide start_ARG 16 end_ARG start_ARG 9 end_ARG = divide start_ARG 4 end_ARG start_ARG 9 end_ARG. E[X|Y>X]=3×49=129=43E[X|Y>X]=3\times\frac{4}{9}=\frac{12}{9}=\frac{4}{3}italic_E [ italic_X | italic_Y > italic_X ] = 3 × divide start_ARG 4 end_ARG start_ARG 9 end_ARG = divide start_ARG 12 end_ARG start_ARG 9 end_ARG = divide start_ARG 4 end_ARG start_ARG 3 end_ARG. Final Answer: The final answer is 4/3\boxed{4/3}4 / 3
最终答案:最终答案为 4/343\boxed{4/3}4 / 3
Table 11: s1K sample stemming from the astronomy subset of OlympicArena. The black text is the question and the blue text is the solution. Based solely on the question we generate the light blue reasoning trace and blue response with Gemini, which we use for training.
表 11:来自 OlympicArena 天文学子集的 s1K 样本。黑色文字是问题,蓝色文字是解决方案。仅基于问题,我们使用 Gemini 生成浅蓝色推理轨迹和蓝色响应,这些用于训练。
An often-repeated fun fact is that humans produce more power per unit volume than stars. If the sun were the same size, but it produced the same amount of power per unit volume as a human, what would its surface temperature be? Assume the ”average human” produces 100 watts of power and has a volume of 66400 cubic centimeters.
一个经常被提及的有趣事实是,人类每单位体积产生的能量比恒星还要多。如果太阳的体积与人类相同,但每单位体积产生的能量与人类相同,那么它的表面温度会是多少?假设“平均人类”产生 100 瓦特的能量,体积为 66400 立方厘米。
A: 3500K3500\mathrm{~{}K}3500 roman_K B: 10000K10000\mathrm{~{}K}10000 roman_K C: 25000K25000\mathrm{~{}K}25000 roman_K D: 40000K40000\mathrm{~{}K}40000 roman_K E: 50000K50000\mathrm{~{}K}50000 roman_K
Using the numbers from the problem, the average human produces
使用问题中的数字,平均人类产生
u=10066400 ×10-6=1506  W / m^3
u=10066400×10^-6=1506 W/m^3
The volume of the sun is 43πR3\frac{4}{3}\pi R_{\odot}^{3}divide start_ARG 4 end_ARG start_ARG 3 end_ARG italic_π italic_R start_POSTSUBSCRIPT ⊙ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT, so its new power output would be P=43πR3uP=\frac{4}{3}\pi R_{\odot}^{3}uitalic_P = divide start_ARG 4 end_ARG start_ARG 3 end_ARG italic_π italic_R start_POSTSUBSCRIPT ⊙ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_u. To find the equilibrium temperature, we use
太阳的体积为 43πR343superscriptsubscriptdirect-product3\frac{4}{3}\pi R_{\odot}^{3}divide start_ARG 4 end_ARG start_ARG 3 end_ARG italic_π italic_R start_POSTSUBSCRIPT ⊙ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ,因此其新的功率输出将为 P=43πR3u43superscriptsubscriptdirect-product3P=\frac{4}{3}\pi R_{\odot}^{3}uitalic_P = divide start_ARG 4 end_ARG start_ARG 3 end_ARG italic_π italic_R start_POSTSUBSCRIPT ⊙ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_u 。为了找到平衡温度,我们使用
P=σA T^4=4 πσR_⊙^2 T^4  P=σAT^4=4πσR⊙^2T^4 Solving for TTitalic_T, we get
求解 TTitalic_T ,得到
T=4Ru3 σ=4(6.96 ×108)(1506)3(5.67 ×10-8)=49823 ≈50000  K .
T=4R⊙u³ σ=4(6.96×10⁸)(1506)³(5.67×10⁻⁸)=49823 ≈50000 K .
The problem asks for the surface temperature of the sun if it had the same size but produced power per unit volume as a human.
这个问题要求计算如果太阳具有与人类相同的体积功率密度,其表面温度是多少。
First, let’s find the power density of a human.
首先,让我们找出人的功率密度。
Power produced by a human, Phuman=100WP_{human}=100\mathrm{~{}W}italic_P start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT = 100 roman_W
人类产生的电力
Volume of a human, Vhuman=66400cm3V_{human}=66400\mathrm{~{}cm}^{3}italic_V start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT = 66400 roman_cm start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
人体体积
Convert the volume to cubic meters:
将体积转换为立方米:
1m=100cm1\mathrm{~{}m}=100\mathrm{~{}cm}1 roman_m = 100 roman_cm 1m3=(100cm)3=106cm31\mathrm{~{}m}^{3}=(100\mathrm{~{}cm})^{3}=10^{6}\mathrm{~{}cm}^{3}1 roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT = ( 100 roman_cm ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT = 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT roman_cm start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT Vhuman=66400cm3=66400×106m3=0.0664m3V_{human}=66400\mathrm{~{}cm}^{3}=66400\times 10^{-6}\mathrm{~{}m}^{3}=0.0664% \mathrm{~{}m}^{3}italic_V start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT = 66400 roman_cm start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT = 66400 × 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT = 0.0664 roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT Power density of a human, ρpower,human=PhumanVhuman=100W0.0664m3\rho_{power,human}=\frac{P_{human}}{V_{human}}=\frac{100\mathrm{~{}W}}{0.0664% \mathrm{~{}m}^{3}}italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT = divide start_ARG italic_P start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_V start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT end_ARG = divide start_ARG 100 roman_W end_ARG start_ARG 0.0664 roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG
人类功率密度
ρpower,human=1506.024W/m3\rho_{power,human}=1506.024\mathrm{~{}W/m^{3}}italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT = 1506.024 roman_W / roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT Now, consider the sun with the same size as it currently is, but with a power density equal to that of a human.
现在,考虑一个与当前太阳大小相同的太阳,但其功率密度与人类相等。
We need the radius of the sun to calculate its volume. The average radius of the sun is Rsun=6.95×108mR_{sun}=6.95\times 10^{8}\mathrm{~{}m}italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT = 6.95 × 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT roman_m.
我们需要太阳的半径来计算其体积。太阳的平均半径为 Rsun=6.95×108msubscript6.95superscript108R_{sun}=6.95\times 10^{8}\mathrm{~{}m}italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT = 6.95 × 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT roman_m
Volume of the sun, Vsun=43πRsun3=43π(6.95×108m)3V_{sun}=\frac{4}{3}\pi R_{sun}^{3}=\frac{4}{3}\pi(6.95\times 10^{8}\mathrm{~{}% m})^{3}italic_V start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT = divide start_ARG 4 end_ARG start_ARG 3 end_ARG italic_π italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT = divide start_ARG 4 end_ARG start_ARG 3 end_ARG italic_π ( 6.95 × 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT roman_m ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
太阳体积
Vsun=43π(337.159375×1024)m3V_{sun}=\frac{4}{3}\pi(337.159375\times 10^{24})\mathrm{~{}m}^{3}italic_V start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT = divide start_ARG 4 end_ARG start_ARG 3 end_ARG italic_π ( 337.159375 × 10 start_POSTSUPERSCRIPT 24 end_POSTSUPERSCRIPT ) roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT Vsun=1410.24×1024×πm3V_{sun}=1410.24\times 10^{24}\times\pi\mathrm{~{}m}^{3}italic_V start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT = 1410.24 × 10 start_POSTSUPERSCRIPT 24 end_POSTSUPERSCRIPT × italic_π roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT Vsun=4.4307×1027m3V_{sun}=4.4307\times 10^{27}\mathrm{~{}m}^{3}italic_V start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT = 4.4307 × 10 start_POSTSUPERSCRIPT 27 end_POSTSUPERSCRIPT roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT If the sun had the same power density as a human, the total power produced by this modified sun would be:
如果太阳具有与人类相同的功率密度,这个修改后的太阳产生的总功率将是:
Pmodified_sun=ρpower,human×VsunP_{modified\_sun}=\rho_{power,human}\times V_{sun}italic_P start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT = italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT × italic_V start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT Pmodified_sun=1506.024W/m3×4.4307×1027m3P_{modified\_sun}=1506.024\mathrm{~{}W/m^{3}}\times 4.4307\times 10^{27}% \mathrm{~{}m}^{3}italic_P start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT = 1506.024 roman_W / roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT × 4.4307 × 10 start_POSTSUPERSCRIPT 27 end_POSTSUPERSCRIPT roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT Pmodified_sun=6.6726×1030WP_{modified\_sun}=6.6726\times 10^{30}\mathrm{~{}W}italic_P start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT = 6.6726 × 10 start_POSTSUPERSCRIPT 30 end_POSTSUPERSCRIPT roman_W
The surface temperature of a star is related to its luminosity (total power output) through the Stefan-Boltzmann law:
恒星表面温度与其光度(总功率输出)通过斯特藩-玻尔兹曼定律相关:
L=4πR2σT4L=4\pi R^{2}\sigma T^{4}italic_L = 4 italic_π italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT where LLitalic_L is the luminosity (total power output), RRitalic_R is the radius of the star, σ\sigmaitalic_σ is the Stefan-Boltzmann constant, and TTitalic_T is the surface temperature.
LLitalic_L 是光度(总功率输出), RRitalic_R 是恒星的半径, σ\sigmaitalic_σ 是斯特藩-玻尔兹曼常数, TTitalic_T 是表面温度。
In this case, the luminosity of the modified sun is Pmodified_sunP_{modified\_sun}italic_P start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT. The radius of the modified sun is the same as the actual sun, RsunR_{sun}italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT.
在这种情况下,修改后的太阳的亮度为 Pmodified_sunsubscriptP_{modified\_sun}italic_P start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT 。修改后的太阳半径与实际太阳相同, RsunsubscriptR_{sun}italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT
The Stefan-Boltzmann constant, σ=5.67×108Wm2K4\sigma=5.67\times 10^{-8}\mathrm{~{}Wm^{-2}K^{-4}}italic_σ = 5.67 × 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT roman_Wm start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT roman_K start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT.
斯特藩-玻尔兹曼常数, σ=5.67×108Wm2K45.67superscript108superscript2superscript4\sigma=5.67\times 10^{-8}\mathrm{~{}Wm^{-2}K^{-4}}italic_σ = 5.67 × 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT roman_Wm start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT roman_K start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
Pmodified_sun=4πRsun2σT4P_{modified\_sun}=4\pi R_{sun}^{2}\sigma T^{4}italic_P start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT = 4 italic_π italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 6.6726×1030W=4π(6.95×108m)2×5.67×108Wm2K4×T46.6726\times 10^{30}\mathrm{~{}W}=4\pi(6.95\times 10^{8}\mathrm{~{}m})^{2}% \times 5.67\times 10^{-8}\mathrm{~{}Wm^{-2}K^{-4}}\times T^{4}6.6726 × 10 start_POSTSUPERSCRIPT 30 end_POSTSUPERSCRIPT roman_W = 4 italic_π ( 6.95 × 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT roman_m ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT × 5.67 × 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT roman_Wm start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT roman_K start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT × italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT Calculate the term 4πRsun2σ4\pi R_{sun}^{2}\sigma4 italic_π italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ:
计算术语 4πRsun2σ4superscriptsubscript24\pi R_{sun}^{2}\sigma4 italic_π italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ
Rsun2=(6.95×108)2=48.3025×1016m2R_{sun}^{2}=(6.95\times 10^{8})^{2}=48.3025\times 10^{16}\mathrm{~{}m}^{2}italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( 6.95 × 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 48.3025 × 10 start_POSTSUPERSCRIPT 16 end_POSTSUPERSCRIPT roman_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 4πRsun2=4π×48.3025×1016=607.08×1016m24\pi R_{sun}^{2}=4\pi\times 48.3025\times 10^{16}=607.08\times 10^{16}\mathrm{% ~{}m}^{2}4 italic_π italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 4 italic_π × 48.3025 × 10 start_POSTSUPERSCRIPT 16 end_POSTSUPERSCRIPT = 607.08 × 10 start_POSTSUPERSCRIPT 16 end_POSTSUPERSCRIPT roman_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 4πRsun2σ=607.08×1016m2×5.67×108Wm2K44\pi R_{sun}^{2}\sigma=607.08\times 10^{16}\mathrm{~{}m}^{2}\times 5.67\times 1% 0^{-8}\mathrm{~{}Wm^{-2}K^{-4}}4 italic_π italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ = 607.08 × 10 start_POSTSUPERSCRIPT 16 end_POSTSUPERSCRIPT roman_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT × 5.67 × 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT roman_Wm start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT roman_K start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 4πRsun2σ=3441.93×108WK4=3.44193×1011WK44\pi R_{sun}^{2}\sigma=3441.93\times 10^{8}\mathrm{~{}WK^{-4}}=3.44193\times 1% 0^{11}\mathrm{~{}WK^{-4}}4 italic_π italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ = 3441.93 × 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT roman_WK start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT = 3.44193 × 10 start_POSTSUPERSCRIPT 11 end_POSTSUPERSCRIPT roman_WK start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT Now, substitute this back into the equation:
现在,将此代回方程式中:
6.6726×1030=3.44193×1011×T46.6726\times 10^{30}=3.44193\times 10^{11}\times T^{4}6.6726 × 10 start_POSTSUPERSCRIPT 30 end_POSTSUPERSCRIPT = 3.44193 × 10 start_POSTSUPERSCRIPT 11 end_POSTSUPERSCRIPT × italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT Solve for T4T^{4}italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT:  求解 T4superscript4T^{4}italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT : T4=6.6726×10303.44193×1011=1.9386×1019T^{4}=\frac{6.6726\times 10^{30}}{3.44193\times 10^{11}}=1.9386\times 10^{19}italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = divide start_ARG 6.6726 × 10 start_POSTSUPERSCRIPT 30 end_POSTSUPERSCRIPT end_ARG start_ARG 3.44193 × 10 start_POSTSUPERSCRIPT 11 end_POSTSUPERSCRIPT end_ARG = 1.9386 × 10 start_POSTSUPERSCRIPT 19 end_POSTSUPERSCRIPT Solve for TTitalic_T:  求解 TTitalic_T : T=(1.9386×1019)1/4=(193.86×1017)1/4T=(1.9386\times 10^{19})^{1/4}=(193.86\times 10^{17})^{1/4}italic_T = ( 1.9386 × 10 start_POSTSUPERSCRIPT 19 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT = ( 193.86 × 10 start_POSTSUPERSCRIPT 17 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT T=(1.9386)1/4×(1019)1/4=(1.9386)0.25×104.75T=(1.9386)^{1/4}\times(10^{19})^{1/4}=(1.9386)^{0.25}\times 10^{4.75}italic_T = ( 1.9386 ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT × ( 10 start_POSTSUPERSCRIPT 19 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT = ( 1.9386 ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT × 10 start_POSTSUPERSCRIPT 4.75 end_POSTSUPERSCRIPT Calculate (1.9386)0.25(1.9386)^{0.25}( 1.9386 ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT:  计算 (1.9386)0.25superscript1.93860.25(1.9386)^{0.25}( 1.9386 ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT : Let y=1.93860.25y=1.9386^{0.25}italic_y = 1.9386 start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT, then lny=0.25ln(1.9386)\ln y=0.25\ln(1.9386)roman_ln italic_y = 0.25 roman_ln ( 1.9386 )
y=1.93860.25superscript1.93860.25y=1.9386^{0.25}italic_y = 1.9386 start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT ,则 lny=0.25ln(1.9386)0.251.9386\ln y=0.25\ln(1.9386)roman_ln italic_y = 0.25 roman_ln ( 1.9386 )
ln(1.9386)0.6619\ln(1.9386)\approx 0.6619roman_ln ( 1.9386 ) ≈ 0.6619 lny=0.25×0.6619=0.165475\ln y=0.25\times 0.6619=0.165475roman_ln italic_y = 0.25 × 0.6619 = 0.165475 y=e0.1654751.1799y=e^{0.165475}\approx 1.1799italic_y = italic_e start_POSTSUPERSCRIPT 0.165475 end_POSTSUPERSCRIPT ≈ 1.1799 So, T=1.1799×104.75=1.1799×104×100.75T=1.1799\times 10^{4.75}=1.1799\times 10^{4}\times 10^{0.75}italic_T = 1.1799 × 10 start_POSTSUPERSCRIPT 4.75 end_POSTSUPERSCRIPT = 1.1799 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT × 10 start_POSTSUPERSCRIPT 0.75 end_POSTSUPERSCRIPT  因此, T=1.1799×104.75=1.1799×104×100.751.1799superscript104.751.1799superscript104superscript100.75T=1.1799\times 10^{4.75}=1.1799\times 10^{4}\times 10^{0.75}italic_T = 1.1799 × 10 start_POSTSUPERSCRIPT 4.75 end_POSTSUPERSCRIPT = 1.1799 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT × 10 start_POSTSUPERSCRIPT 0.75 end_POSTSUPERSCRIPT 100.75=103/4=(103)1/4=(1000)1/410^{0.75}=10^{3/4}=(10^{3})^{1/4}=(1000)^{1/4}10 start_POSTSUPERSCRIPT 0.75 end_POSTSUPERSCRIPT = 10 start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT = ( 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT = ( 1000 ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT We know 54=6255^{4}=6255 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = 625 and 64=12966^{4}=12966 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = 1296, so (1000)1/4(1000)^{1/4}( 1000 ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT is between 5 and 6.
我们知道 54=625superscript546255^{4}=6255 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = 62564=1296superscript6412966^{4}=12966 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = 1296 ,因此 (1000)1/4superscript100014(1000)^{1/4}( 1000 ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT 介于 5 和 6 之间。
Alternatively, 100.75=100.5×100.25=10×100.253.16×1.778=5.6210^{0.75}=10^{0.5}\times 10^{0.25}=\sqrt{10}\times 10^{0.25}\approx 3.16\times 1% .778=5.6210 start_POSTSUPERSCRIPT 0.75 end_POSTSUPERSCRIPT = 10 start_POSTSUPERSCRIPT 0.5 end_POSTSUPERSCRIPT × 10 start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT = square-root start_ARG 10 end_ARG × 10 start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT ≈ 3.16 × 1.778 = 5.62  或者, 100.75=100.5×100.25=10×100.253.16×1.778=5.62superscript100.75superscript100.5superscript100.2510superscript100.253.161.7785.6210^{0.75}=10^{0.5}\times 10^{0.25}=\sqrt{10}\times 10^{0.25}\approx 3.16\times 1% .778=5.6210 start_POSTSUPERSCRIPT 0.75 end_POSTSUPERSCRIPT = 10 start_POSTSUPERSCRIPT 0.5 end_POSTSUPERSCRIPT × 10 start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT = square-root start_ARG 10 end_ARG × 10 start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT ≈ 3.16 × 1.778 = 5.62 T=1.1799×104×5.62=6.63×104K=66300KT=1.1799\times 10^{4}\times 5.62=6.63\times 10^{4}\mathrm{~{}K}=66300\mathrm{~% {}K}italic_T = 1.1799 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT × 5.62 = 6.63 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT roman_K = 66300 roman_K Let’s double-check the calculations.
让我们再次核对计算。
Power density of human: ρpower,human=1506.02W/m3\rho_{power,human}=1506.02\mathrm{~{}W/m^{3}}italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT = 1506.02 roman_W / roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
人类功率密度: ρpower,human=1506.02W/m3subscript1506.02superscript3\rho_{power,human}=1506.02\mathrm{~{}W/m^{3}}italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT = 1506.02 roman_W / roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
Volume of sun: Vsun=4.43×1027m3V_{sun}=4.43\times 10^{27}\mathrm{~{}m}^{3}italic_V start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT = 4.43 × 10 start_POSTSUPERSCRIPT 27 end_POSTSUPERSCRIPT roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT  太阳体积: Vsun=4.43×1027m3subscript4.43superscript1027superscript3V_{sun}=4.43\times 10^{27}\mathrm{~{}m}^{3}italic_V start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT = 4.43 × 10 start_POSTSUPERSCRIPT 27 end_POSTSUPERSCRIPT roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT Power of modified sun: Pmodified_sun=1506.02×4.43×1027=6671.67×1027=6.67×1030WP_{modified\_sun}=1506.02\times 4.43\times 10^{27}=6671.67\times 10^{27}=6.67% \times 10^{30}\mathrm{~{}W}italic_P start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT = 1506.02 × 4.43 × 10 start_POSTSUPERSCRIPT 27 end_POSTSUPERSCRIPT = 6671.67 × 10 start_POSTSUPERSCRIPT 27 end_POSTSUPERSCRIPT = 6.67 × 10 start_POSTSUPERSCRIPT 30 end_POSTSUPERSCRIPT roman_W
功率修正太阳: Pmodified_sun=1506.02×4.43×1027=6671.67×1027=6.67×1030Wsubscript1506.024.43superscript10276671.67superscript10276.67superscript1030P_{modified\_sun}=1506.02\times 4.43\times 10^{27}=6671.67\times 10^{27}=6.67% \times 10^{30}\mathrm{~{}W}italic_P start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT = 1506.02 × 4.43 × 10 start_POSTSUPERSCRIPT 27 end_POSTSUPERSCRIPT = 6671.67 × 10 start_POSTSUPERSCRIPT 27 end_POSTSUPERSCRIPT = 6.67 × 10 start_POSTSUPERSCRIPT 30 end_POSTSUPERSCRIPT roman_W
Stefan-Boltzmann law: P=4πR2σT4P=4\pi R^{2}\sigma T^{4}italic_P = 4 italic_π italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT
斯特藩-玻尔兹曼定律: P=4πR2σT44superscript2superscript4P=4\pi R^{2}\sigma T^{4}italic_P = 4 italic_π italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT
T4=P4πR2σT^{4}=\frac{P}{4\pi R^{2}\sigma}italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = divide start_ARG italic_P end_ARG start_ARG 4 italic_π italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ end_ARG 4πRsun2σ=3.44×10114\pi R_{sun}^{2}\sigma=3.44\times 10^{11}4 italic_π italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ = 3.44 × 10 start_POSTSUPERSCRIPT 11 end_POSTSUPERSCRIPT T4=6.67×10303.44×1011=1.939×1019T^{4}=\frac{6.67\times 10^{30}}{3.44\times 10^{11}}=1.939\times 10^{19}italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = divide start_ARG 6.67 × 10 start_POSTSUPERSCRIPT 30 end_POSTSUPERSCRIPT end_ARG start_ARG 3.44 × 10 start_POSTSUPERSCRIPT 11 end_POSTSUPERSCRIPT end_ARG = 1.939 × 10 start_POSTSUPERSCRIPT 19 end_POSTSUPERSCRIPT T=(1.939×1019)1/4=(193.9×1017)1/4T=(1.939\times 10^{19})^{1/4}=(193.9\times 10^{17})^{1/4}italic_T = ( 1.939 × 10 start_POSTSUPERSCRIPT 19 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT = ( 193.9 × 10 start_POSTSUPERSCRIPT 17 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT T=(193.9)0.25×(1017)0.25=(193.9)0.25×104.25T=(193.9)^{0.25}\times(10^{17})^{0.25}=(193.9)^{0.25}\times 10^{4.25}italic_T = ( 193.9 ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT × ( 10 start_POSTSUPERSCRIPT 17 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT = ( 193.9 ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT × 10 start_POSTSUPERSCRIPT 4.25 end_POSTSUPERSCRIPT Calculate (193.9)0.25(193.9)^{0.25}( 193.9 ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT:  计算 (193.9)0.25superscript193.90.25(193.9)^{0.25}( 193.9 ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT : Let y=193.90.25y=193.9^{0.25}italic_y = 193.9 start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT, then lny=0.25ln(193.9)\ln y=0.25\ln(193.9)roman_ln italic_y = 0.25 roman_ln ( 193.9 )
y=193.90.25superscript193.90.25y=193.9^{0.25}italic_y = 193.9 start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT ,则 lny=0.25ln(193.9)0.25193.9\ln y=0.25\ln(193.9)roman_ln italic_y = 0.25 roman_ln ( 193.9 )
ln(193.9)5.267\ln(193.9)\approx 5.267roman_ln ( 193.9 ) ≈ 5.267 lny=0.25×5.267=1.31675\ln y=0.25\times 5.267=1.31675roman_ln italic_y = 0.25 × 5.267 = 1.31675 y=e1.316753.73y=e^{1.31675}\approx 3.73italic_y = italic_e start_POSTSUPERSCRIPT 1.31675 end_POSTSUPERSCRIPT ≈ 3.73 T=3.73×104.25=3.73×104×100.25=37300×1.778=66329KT=3.73\times 10^{4.25}=3.73\times 10^{4}\times 10^{0.25}=37300\times 1.778=663% 29\mathrm{~{}K}italic_T = 3.73 × 10 start_POSTSUPERSCRIPT 4.25 end_POSTSUPERSCRIPT = 3.73 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT × 10 start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT = 37300 × 1.778 = 66329 roman_K The result is around 66000K66000\mathrm{~{}K}66000 roman_K. This is not among the options. Let’s review the assumptions and calculations.
结果是大约 66000K6600066000\mathrm{~{}K}66000 roman_K 。这个选项不在其中。让我们回顾一下假设和计算。
Assumption: The sun radiates as a black body, which is reasonable for estimating surface temperature.
假设:太阳以黑体的形式辐射,这对于估计表面温度是合理的。
Consider potential errors in calculations.
考虑计算中的潜在错误。
Let’s try a simpler approach by setting up ratios.
让我们通过设置比率尝试一种更简单的方法。
The power per unit volume of the modified sun is equal to that of a human.
修改后的太阳单位体积功率等于人类。
Pmodified_sunVsun=PhumanVhuman\frac{P_{modified\_sun}}{V_{sun}}=\frac{P_{human}}{V_{human}}divide start_ARG italic_P start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_V start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT end_ARG = divide start_ARG italic_P start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_V start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT end_ARG We also know Pmodified_sun=4πRsun2σTmodified_sun4P_{modified\_sun}=4\pi R_{sun}^{2}\sigma T_{modified\_sun}^{4}italic_P start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT = 4 italic_π italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ italic_T start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT.
我们同样知道 Pmodified_sun=4πRsun2σTmodified_sun4subscript4superscriptsubscript2superscriptsubscript4P_{modified\_sun}=4\pi R_{sun}^{2}\sigma T_{modified\_sun}^{4}italic_P start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT = 4 italic_π italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ italic_T start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT
4πRsun2σTmodified_sun443πRsun3=PhumanVhuman\frac{4\pi R_{sun}^{2}\sigma T_{modified\_sun}^{4}}{\frac{4}{3}\pi R_{sun}^{3}% }=\frac{P_{human}}{V_{human}}divide start_ARG 4 italic_π italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ italic_T start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG divide start_ARG 4 end_ARG start_ARG 3 end_ARG italic_π italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG = divide start_ARG italic_P start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_V start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT end_ARG 3σTmodified_sun4Rsun=ρpower,human\frac{3\sigma T_{modified\_sun}^{4}}{R_{sun}}=\rho_{power,human}divide start_ARG 3 italic_σ italic_T start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT end_ARG = italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT Tmodified_sun4=ρpower,humanRsun3σT_{modified\_sun}^{4}=\frac{\rho_{power,human}R_{sun}}{3\sigma}italic_T start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = divide start_ARG italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT end_ARG start_ARG 3 italic_σ end_ARG Substitute the values:  替换值: ρpower,human=1506.02W/m3\rho_{power,human}=1506.02\mathrm{~{}W/m^{3}}italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT = 1506.02 roman_W / roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT Rsun=6.95×108mR_{sun}=6.95\times 10^{8}\mathrm{~{}m}italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT = 6.95 × 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT roman_m σ=5.67×108Wm2K4\sigma=5.67\times 10^{-8}\mathrm{~{}Wm^{-2}K^{-4}}italic_σ = 5.67 × 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT roman_Wm start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT roman_K start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT Tmodified_sun4=1506.02×6.95×1083×5.67×108T_{modified\_sun}^{4}=\frac{1506.02\times 6.95\times 10^{8}}{3\times 5.67% \times 10^{-8}}italic_T start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = divide start_ARG 1506.02 × 6.95 × 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT end_ARG start_ARG 3 × 5.67 × 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT end_ARG Tmodified_sun4=10466.839×10817.01×108=1.0466×10121.701×107=0.6153×1019=6.153×1018T_{modified\_sun}^{4}=\frac{10466.839\times 10^{8}}{17.01\times 10^{-8}}=\frac% {1.0466\times 10^{12}}{1.701\times 10^{-7}}=0.6153\times 10^{19}=6.153\times 1% 0^{18}italic_T start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = divide start_ARG 10466.839 × 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT end_ARG start_ARG 17.01 × 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT end_ARG = divide start_ARG 1.0466 × 10 start_POSTSUPERSCRIPT 12 end_POSTSUPERSCRIPT end_ARG start_ARG 1.701 × 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT end_ARG = 0.6153 × 10 start_POSTSUPERSCRIPT 19 end_POSTSUPERSCRIPT = 6.153 × 10 start_POSTSUPERSCRIPT 18 end_POSTSUPERSCRIPT Tmodified_sun=(6.153×1018)1/4=(6153×1015)1/4T_{modified\_sun}=(6.153\times 10^{18})^{1/4}=(6153\times 10^{15})^{1/4}italic_T start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT = ( 6.153 × 10 start_POSTSUPERSCRIPT 18 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT = ( 6153 × 10 start_POSTSUPERSCRIPT 15 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT Tmodified_sun=(6153)0.25×(1015)0.25=(6153)0.25×103.75T_{modified\_sun}=(6153)^{0.25}\times(10^{15})^{0.25}=(6153)^{0.25}\times 10^{% 3.75}italic_T start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT = ( 6153 ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT × ( 10 start_POSTSUPERSCRIPT 15 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT = ( 6153 ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT × 10 start_POSTSUPERSCRIPT 3.75 end_POSTSUPERSCRIPT Calculate (6153)0.25(6153)^{0.25}( 6153 ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT:  计算 (6153)0.25superscript61530.25(6153)^{0.25}( 6153 ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT : Let y=61530.25y=6153^{0.25}italic_y = 6153 start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT, then lny=0.25ln(6153)\ln y=0.25\ln(6153)roman_ln italic_y = 0.25 roman_ln ( 6153 )
y=61530.25superscript61530.25y=6153^{0.25}italic_y = 6153 start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT ,则 lny=0.25ln(6153)0.256153\ln y=0.25\ln(6153)roman_ln italic_y = 0.25 roman_ln ( 6153 )
ln(6153)8.724\ln(6153)\approx 8.724roman_ln ( 6153 ) ≈ 8.724 lny=0.25×8.724=2.181\ln y=0.25\times 8.724=2.181roman_ln italic_y = 0.25 × 8.724 = 2.181 y=e2.1818.85y=e^{2.181}\approx 8.85italic_y = italic_e start_POSTSUPERSCRIPT 2.181 end_POSTSUPERSCRIPT ≈ 8.85 Tmodified_sun=8.85×103.75=8.85×103×100.75=8850×5.62=49737KT_{modified\_sun}=8.85\times 10^{3.75}=8.85\times 10^{3}\times 10^{0.75}=8850% \times 5.62=49737\mathrm{~{}K}italic_T start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT = 8.85 × 10 start_POSTSUPERSCRIPT 3.75 end_POSTSUPERSCRIPT = 8.85 × 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT × 10 start_POSTSUPERSCRIPT 0.75 end_POSTSUPERSCRIPT = 8850 × 5.62 = 49737 roman_K This is close to option E: 50000K50000\mathrm{~{}K}50000 roman_K.
这接近选项 E: 50000K5000050000\mathrm{~{}K}50000 roman_K
Let’s re-check the units and calculations carefully.
让我们仔细复查单位和计算。
Power density of human: 100W0.0664m3=1506.02W/m3\frac{100\mathrm{~{}W}}{0.0664\mathrm{~{}m}^{3}}=1506.02\mathrm{~{}W/m^{3}}divide start_ARG 100 roman_W end_ARG start_ARG 0.0664 roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG = 1506.02 roman_W / roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
人类功率密度: 100W0.0664m3=1506.02W/m31000.0664superscript31506.02superscript3\frac{100\mathrm{~{}W}}{0.0664\mathrm{~{}m}^{3}}=1506.02\mathrm{~{}W/m^{3}}divide start_ARG 100 roman_W end_ARG start_ARG 0.0664 roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG = 1506.02 roman_W / roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
The formula derived was T4=ρpower,humanRsun3σT^{4}=\frac{\rho_{power,human}R_{sun}}{3\sigma}italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = divide start_ARG italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT end_ARG start_ARG 3 italic_σ end_ARG.
推导出的公式为 T4=ρpower,humanRsun3σsuperscript4subscriptsubscript3T^{4}=\frac{\rho_{power,human}R_{sun}}{3\sigma}italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = divide start_ARG italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT end_ARG start_ARG 3 italic_σ end_ARG
Units check:  单元检查: [ρpower,human]=W/m3[\rho_{power,human}]=\mathrm{W/m^{3}}[ italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT ] = roman_W / roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT [Rsun]=m[R_{sun}]=\mathrm{m}[ italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT ] = roman_m [σ]=Wm2K4[\sigma]=\mathrm{Wm^{-2}K^{-4}}[ italic_σ ] = roman_Wm start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT roman_K start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT (W/m3)×mWm2K4=W/m2Wm2K4=Wm2×m2K4W=K4\frac{(\mathrm{W/m^{3}})\times\mathrm{m}}{\mathrm{Wm^{-2}K^{-4}}}=\frac{% \mathrm{W/m^{2}}}{\mathrm{Wm^{-2}K^{-4}}}=\frac{\mathrm{W}}{\mathrm{m^{2}}}% \times\frac{\mathrm{m^{2}}\mathrm{K^{4}}}{\mathrm{W}}=\mathrm{K^{4}}divide start_ARG ( roman_W / roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) × roman_m end_ARG start_ARG roman_Wm start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT roman_K start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT end_ARG = divide start_ARG roman_W / roman_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Wm start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT roman_K start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT end_ARG = divide start_ARG roman_W end_ARG start_ARG roman_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG × divide start_ARG roman_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_K start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG roman_W end_ARG = roman_K start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT The units match for T4T^{4}italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT.
单元与 T4superscript4T^{4}italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 匹配。
Numerical calculation check:
数值计算校验:
Numerator: 1506.02×6.95×108=10466.839×108=1.0466×10121506.02\times 6.95\times 10^{8}=10466.839\times 10^{8}=1.0466\times 10^{12}1506.02 × 6.95 × 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT = 10466.839 × 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT = 1.0466 × 10 start_POSTSUPERSCRIPT 12 end_POSTSUPERSCRIPT  分子: 1506.02×6.95×108=10466.839×108=1.0466×10121506.026.95superscript10810466.839superscript1081.0466superscript10121506.02\times 6.95\times 10^{8}=10466.839\times 10^{8}=1.0466\times 10^{12}1506.02 × 6.95 × 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT = 10466.839 × 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT = 1.0466 × 10 start_POSTSUPERSCRIPT 12 end_POSTSUPERSCRIPT Denominator: 3×5.67×108=17.01×1083\times 5.67\times 10^{-8}=17.01\times 10^{-8}3 × 5.67 × 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT = 17.01 × 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT  分母: 3×5.67×108=17.01×10835.67superscript10817.01superscript1083\times 5.67\times 10^{-8}=17.01\times 10^{-8}3 × 5.67 × 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT = 17.01 × 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT Ratio: 1.0466×101217.01×108=0.061528×1020=6.1528×1018\frac{1.0466\times 10^{12}}{17.01\times 10^{-8}}=0.061528\times 10^{20}=6.1528% \times 10^{18}divide start_ARG 1.0466 × 10 start_POSTSUPERSCRIPT 12 end_POSTSUPERSCRIPT end_ARG start_ARG 17.01 × 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT end_ARG = 0.061528 × 10 start_POSTSUPERSCRIPT 20 end_POSTSUPERSCRIPT = 6.1528 × 10 start_POSTSUPERSCRIPT 18 end_POSTSUPERSCRIPT  比: 1.0466×101217.01×108=0.061528×1020=6.1528×10181.0466superscript101217.01superscript1080.061528superscript10206.1528superscript1018\frac{1.0466\times 10^{12}}{17.01\times 10^{-8}}=0.061528\times 10^{20}=6.1528% \times 10^{18}divide start_ARG 1.0466 × 10 start_POSTSUPERSCRIPT 12 end_POSTSUPERSCRIPT end_ARG start_ARG 17.01 × 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT end_ARG = 0.061528 × 10 start_POSTSUPERSCRIPT 20 end_POSTSUPERSCRIPT = 6.1528 × 10 start_POSTSUPERSCRIPT 18 end_POSTSUPERSCRIPT T4=6.1528×1018T^{4}=6.1528\times 10^{18}italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = 6.1528 × 10 start_POSTSUPERSCRIPT 18 end_POSTSUPERSCRIPT T=(6.1528×1018)1/4=(6152.8×1015)1/4T=(6.1528\times 10^{18})^{1/4}=(6152.8\times 10^{15})^{1/4}italic_T = ( 6.1528 × 10 start_POSTSUPERSCRIPT 18 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT = ( 6152.8 × 10 start_POSTSUPERSCRIPT 15 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT T=(6152.8)0.25×103.75T=(6152.8)^{0.25}\times 10^{3.75}italic_T = ( 6152.8 ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT × 10 start_POSTSUPERSCRIPT 3.75 end_POSTSUPERSCRIPT Let’s use logarithms for precision.
让我们使用对数来提高精度。
log10(6152.8)3.789\log_{10}(6152.8)\approx 3.789roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ( 6152.8 ) ≈ 3.789 0.25×3.789=0.947250.25\times 3.789=0.947250.25 × 3.789 = 0.94725 100.947258.85610^{0.94725}\approx 8.85610 start_POSTSUPERSCRIPT 0.94725 end_POSTSUPERSCRIPT ≈ 8.856 T=8.856×103.75=8.856×103×100.75=8856×5.623=49806KT=8.856\times 10^{3.75}=8.856\times 10^{3}\times 10^{0.75}=8856\times 5.623=49% 806\mathrm{~{}K}italic_T = 8.856 × 10 start_POSTSUPERSCRIPT 3.75 end_POSTSUPERSCRIPT = 8.856 × 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT × 10 start_POSTSUPERSCRIPT 0.75 end_POSTSUPERSCRIPT = 8856 × 5.623 = 49806 roman_K This is very close to 50000K50000\mathrm{~{}K}50000 roman_K.
这非常接近 50000K5000050000\mathrm{~{}K}50000 roman_K
Possible sources of minor discrepancies: rounding errors, precision of constants.
可能存在微小差异的来源:舍入误差、常数的精度。
Let’s verify the formula derivation.
让我们验证公式的推导。
Power per unit volume = Total PowerVolume\frac{\text{Total Power}}{\text{Volume}}divide start_ARG Total Power end_ARG start_ARG Volume end_ARG
功率密度 = Total PowerVolume\frac{\text{Total Power}}{\text{Volume}}divide start_ARG Total Power end_ARG start_ARG Volume end_ARG
ρpower=LV\rho_{power}=\frac{L}{V}italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r end_POSTSUBSCRIPT = divide start_ARG italic_L end_ARG start_ARG italic_V end_ARG L=4πR2σT4L=4\pi R^{2}\sigma T^{4}italic_L = 4 italic_π italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT V=43πR3V=\frac{4}{3}\pi R^{3}italic_V = divide start_ARG 4 end_ARG start_ARG 3 end_ARG italic_π italic_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ρpower=4πR2σT443πR3=3σT4R\rho_{power}=\frac{4\pi R^{2}\sigma T^{4}}{\frac{4}{3}\pi R^{3}}=\frac{3\sigma T% ^{4}}{R}italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r end_POSTSUBSCRIPT = divide start_ARG 4 italic_π italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG divide start_ARG 4 end_ARG start_ARG 3 end_ARG italic_π italic_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG = divide start_ARG 3 italic_σ italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG italic_R end_ARG If the modified sun has the power density of a human:
如果修改后的太阳具有人类的功率密度:
ρpower,human=3σTmodified_sun4Rsun\rho_{power,human}=\frac{3\sigma T_{modified\_sun}^{4}}{R_{sun}}italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT = divide start_ARG 3 italic_σ italic_T start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT end_ARG Tmodified_sun4=ρpower,humanRsun3σT_{modified\_sun}^{4}=\frac{\rho_{power,human}R_{sun}}{3\sigma}italic_T start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = divide start_ARG italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT end_ARG start_ARG 3 italic_σ end_ARG The formula derivation seems correct.
公式推导似乎是正确的。
Let’s use approximate values to check the order of magnitude.
让我们使用近似值来检查数量级。
ρpower,human103W/m3\rho_{power,human}\approx 10^{3}\mathrm{~{}W/m^{3}}italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT ≈ 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_W / roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT Rsun7×108mR_{sun}\approx 7\times 10^{8}\mathrm{~{}m}italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT ≈ 7 × 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT roman_m σ6×108Wm2K4\sigma\approx 6\times 10^{-8}\mathrm{~{}Wm^{-2}K^{-4}}italic_σ ≈ 6 × 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT roman_Wm start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT roman_K start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT T4103×7×1083×6×108=7×101118×108=0.38×1019=3.8×1018T^{4}\approx\frac{10^{3}\times 7\times 10^{8}}{3\times 6\times 10^{-8}}=\frac{% 7\times 10^{11}}{18\times 10^{-8}}=0.38\times 10^{19}=3.8\times 10^{18}italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ≈ divide start_ARG 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT × 7 × 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT end_ARG start_ARG 3 × 6 × 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT end_ARG = divide start_ARG 7 × 10 start_POSTSUPERSCRIPT 11 end_POSTSUPERSCRIPT end_ARG start_ARG 18 × 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT end_ARG = 0.38 × 10 start_POSTSUPERSCRIPT 19 end_POSTSUPERSCRIPT = 3.8 × 10 start_POSTSUPERSCRIPT 18 end_POSTSUPERSCRIPT T(3.8×1018)1/4=(3800×1015)1/4T\approx(3.8\times 10^{18})^{1/4}=(3800\times 10^{15})^{1/4}italic_T ≈ ( 3.8 × 10 start_POSTSUPERSCRIPT 18 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT = ( 3800 × 10 start_POSTSUPERSCRIPT 15 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT T(3800)0.25×103.75T\approx(3800)^{0.25}\times 10^{3.75}italic_T ≈ ( 3800 ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT × 10 start_POSTSUPERSCRIPT 3.75 end_POSTSUPERSCRIPT (3800)0.25(3800)^{0.25}( 3800 ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT: 74=24017^{4}=24017 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = 2401, 84=40968^{4}=40968 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = 4096. So, between 7 and 8.
(3800)0.25superscript38000.25(3800)^{0.25}( 3800 ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT : 74=2401superscript7424017^{4}=24017 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = 240184=4096superscript8440968^{4}=40968 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = 4096 。因此,在 7 到 8 之间。
7.8\approx 7.8≈ 7.8 T7.8×103.75=7800×5.6=43680KT\approx 7.8\times 10^{3.75}=7800\times 5.6=43680\mathrm{~{}K}italic_T ≈ 7.8 × 10 start_POSTSUPERSCRIPT 3.75 end_POSTSUPERSCRIPT = 7800 × 5.6 = 43680 roman_K The order of magnitude seems consistent with the detailed calculation.
量级似乎与详细计算一致。
The problem statement is straightforward and doesn’t seem to have any hidden complexities. The assumptions made are standard for such problems.
问题描述简单明了,似乎没有隐藏的复杂性。所做出的假设对于此类问题来说是标准的。
Consider the precision of the given values.
考虑给定值的精度。
Power of human: 100 W (exact)
人类功率:100 W(精确值)
Volume of human: 66400 cm3 (exact)
人类体积:66400 cm³ 3 (精确)
The result obtained is very close to option E. Given the nature of multiple-choice questions, the closest option is likely the intended answer.
得到的结果非常接近选项 E。考虑到多选题的性质,最接近的选项很可能是正确答案。
If there were any misinterpretations of the question, it would likely be in the setup. However, the phrase ”If the sun were the same size, but it produced the same amount of power per unit volume as a human” seems clear.
如果对问题的理解有任何误解,很可能出现在设置上。然而,“如果太阳的大小相同,但单位体积产生的能量与人类相同”这个说法似乎是清晰的。
The surface temperature is calculated using the Stefan-Boltzmann law, assuming black-body radiation.
表面温度是根据斯蒂芬-玻尔兹曼定律计算的,假设为黑体辐射。
Double-check the provided options:
仔细核对提供的选项:
A: 3500K3500\mathrm{~{}K}3500 roman_K B: 10000K10000\mathrm{~{}K}10000 roman_K C: 25000K25000\mathrm{~{}K}25000 roman_K D: 40000K40000\mathrm{~{}K}40000 roman_K E: 50000K50000\mathrm{~{}K}50000 roman_K Our calculated value is approximately 49800K49800\mathrm{~{}K}49800 roman_K, which is closest to 50000K50000\mathrm{~{}K}50000 roman_K.
我们的计算值约为 49800K4980049800\mathrm{~{}K}49800 roman_K ,最接近于 50000K5000050000\mathrm{~{}K}50000 roman_K
If we used more precise values for the constants, the result might be slightly different, but likely not enough to shift to another option.
如果我们使用更精确的常数值,结果可能会有所不同,但可能不足以切换到另一个选项。
The fun fact mentioned in the question implies that the power density of a human is higher than that of the sun. Let’s verify this.
问题中提到的有趣事实表明,人的功率密度高于太阳。让我们来验证这一点。
Power output of the sun: 3.846×1026W3.846\times 10^{26}\mathrm{~{}W}3.846 × 10 start_POSTSUPERSCRIPT 26 end_POSTSUPERSCRIPT roman_W
太阳功率输出: 3.846×1026W3.846superscript10263.846\times 10^{26}\mathrm{~{}W}3.846 × 10 start_POSTSUPERSCRIPT 26 end_POSTSUPERSCRIPT roman_W
Volume of the sun: 4.43×1027m34.43\times 10^{27}\mathrm{~{}m}^{3}4.43 × 10 start_POSTSUPERSCRIPT 27 end_POSTSUPERSCRIPT roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
太阳体积: 4.43×1027m34.43superscript1027superscript34.43\times 10^{27}\mathrm{~{}m}^{3}4.43 × 10 start_POSTSUPERSCRIPT 27 end_POSTSUPERSCRIPT roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
Power density of the sun: 3.846×10264.43×1027=0.868W/m3\frac{3.846\times 10^{26}}{4.43\times 10^{27}}=0.868\mathrm{~{}W/m^{3}}divide start_ARG 3.846 × 10 start_POSTSUPERSCRIPT 26 end_POSTSUPERSCRIPT end_ARG start_ARG 4.43 × 10 start_POSTSUPERSCRIPT 27 end_POSTSUPERSCRIPT end_ARG = 0.868 roman_W / roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
太阳功率密度: 3.846×10264.43×1027=0.868W/m33.846superscript10264.43superscript10270.868superscript3\frac{3.846\times 10^{26}}{4.43\times 10^{27}}=0.868\mathrm{~{}W/m^{3}}divide start_ARG 3.846 × 10 start_POSTSUPERSCRIPT 26 end_POSTSUPERSCRIPT end_ARG start_ARG 4.43 × 10 start_POSTSUPERSCRIPT 27 end_POSTSUPERSCRIPT end_ARG = 0.868 roman_W / roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
Power density of human: 1506W/m31506\mathrm{~{}W/m^{3}}1506 roman_W / roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
人类功率密度: 1506W/m31506superscript31506\mathrm{~{}W/m^{3}}1506 roman_W / roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
Indeed, the power density of a human is significantly higher than that of the sun.
确实,人的功率密度远高于太阳。
Final check of the calculation steps and logic. The approach of equating the power density of the modified sun to that of a human and then using the Stefan-Boltzmann law seems correct.
最终检查计算步骤和逻辑。将修改后的太阳功率密度等同于人类,然后使用斯蒂芬-玻尔兹曼定律的方法似乎是正确的。
Solution:  解决方案: The power produced by an average human is Phuman=100WP_{human}=100\mathrm{~{}W}italic_P start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT = 100 roman_W, and the volume of an average human is Vhuman=66400cm3=0.0664m3V_{human}=66400\mathrm{~{}cm}^{3}=0.0664\mathrm{~{}m}^{3}italic_V start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT = 66400 roman_cm start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT = 0.0664 roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT.
一个人平均产生的功率为 Phuman=100Wsubscript100P_{human}=100\mathrm{~{}W}italic_P start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT = 100 roman_W ,一个人的平均体积为 Vhuman=66400cm3=0.0664m3subscript66400superscript30.0664superscript3V_{human}=66400\mathrm{~{}cm}^{3}=0.0664\mathrm{~{}m}^{3}italic_V start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT = 66400 roman_cm start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT = 0.0664 roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
The power density of a human is ρpower,human=PhumanVhuman=100W0.0664m31506.02W/m3\rho_{power,human}=\frac{P_{human}}{V_{human}}=\frac{100\mathrm{~{}W}}{0.0664% \mathrm{~{}m}^{3}}\approx 1506.02\mathrm{~{}W/m^{3}}italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT = divide start_ARG italic_P start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_V start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT end_ARG = divide start_ARG 100 roman_W end_ARG start_ARG 0.0664 roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ≈ 1506.02 roman_W / roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT.
人类功率密度为 ρpower,human=PhumanVhuman=100W0.0664m31506.02W/m3subscriptsubscriptsubscript1000.0664superscript31506.02superscript3\rho_{power,human}=\frac{P_{human}}{V_{human}}=\frac{100\mathrm{~{}W}}{0.0664% \mathrm{~{}m}^{3}}\approx 1506.02\mathrm{~{}W/m^{3}}italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT = divide start_ARG italic_P start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_V start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT end_ARG = divide start_ARG 100 roman_W end_ARG start_ARG 0.0664 roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ≈ 1506.02 roman_W / roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
If the sun were the same size but produced the same amount of power per unit volume as a human, its power density would be ρpower,modified_sun=ρpower,human\rho_{power,modified\_sun}=\rho_{power,human}italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT = italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT.
如果太阳的体积与人类相同,但单位体积产生的能量与人类相同,其功率密度将是 ρpower,modified_sun=ρpower,humansubscriptsubscript\rho_{power,modified\_sun}=\rho_{power,human}italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT = italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT
The radius of the sun is Rsun=6.95×108mR_{sun}=6.95\times 10^{8}\mathrm{~{}m}italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT = 6.95 × 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT roman_m.
太阳的半径为 Rsun=6.95×108msubscript6.95superscript108R_{sun}=6.95\times 10^{8}\mathrm{~{}m}italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT = 6.95 × 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT roman_m
The volume of the sun is Vsun=43πRsun3V_{sun}=\frac{4}{3}\pi R_{sun}^{3}italic_V start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT = divide start_ARG 4 end_ARG start_ARG 3 end_ARG italic_π italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT.
太阳的体积为 Vsun=43πRsun3subscript43superscriptsubscript3V_{sun}=\frac{4}{3}\pi R_{sun}^{3}italic_V start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT = divide start_ARG 4 end_ARG start_ARG 3 end_ARG italic_π italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
The power density of the modified sun is given by ρpower,modified_sun=Pmodified_sunVsun\rho_{power,modified\_sun}=\frac{P_{modified\_sun}}{V_{sun}}italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT = divide start_ARG italic_P start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_V start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT end_ARG.
修改后的太阳功率密度由 ρpower,modified_sun=Pmodified_sunVsunsubscriptsubscriptsubscript\rho_{power,modified\_sun}=\frac{P_{modified\_sun}}{V_{sun}}italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT = divide start_ARG italic_P start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_V start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT end_ARG 给出。
We also know that Pmodified_sun=4πRsun2σT4P_{modified\_sun}=4\pi R_{sun}^{2}\sigma T^{4}italic_P start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT = 4 italic_π italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT, where TTitalic_T is the surface temperature of the modified sun.
我们同样知道 Pmodified_sun=4πRsun2σT4subscript4superscriptsubscript2superscript4P_{modified\_sun}=4\pi R_{sun}^{2}\sigma T^{4}italic_P start_POSTSUBSCRIPT italic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d _ italic_s italic_u italic_n end_POSTSUBSCRIPT = 4 italic_π italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ,其中 TTitalic_T 是修改后的太阳的表面温度。
Equating the power density:
等价功率密度:
ρpower,human=4πRsun2σT443πRsun3=3σT4Rsun\rho_{power,human}=\frac{4\pi R_{sun}^{2}\sigma T^{4}}{\frac{4}{3}\pi R_{sun}^% {3}}=\frac{3\sigma T^{4}}{R_{sun}}italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT = divide start_ARG 4 italic_π italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG divide start_ARG 4 end_ARG start_ARG 3 end_ARG italic_π italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG = divide start_ARG 3 italic_σ italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT end_ARG Solving for T4T^{4}italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT:  求解 T4superscript4T^{4}italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT : T4=ρpower,humanRsun3σT^{4}=\frac{\rho_{power,human}R_{sun}}{3\sigma}italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = divide start_ARG italic_ρ start_POSTSUBSCRIPT italic_p italic_o italic_w italic_e italic_r , italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_s italic_u italic_n end_POSTSUBSCRIPT end_ARG start_ARG 3 italic_σ end_ARG Substitute the values:  替换值: T4=1506.02W/m3×6.95×108m3×5.67×108Wm2K4T^{4}=\frac{1506.02\mathrm{~{}W/m^{3}}\times 6.95\times 10^{8}\mathrm{~{}m}}{3% \times 5.67\times 10^{-8}\mathrm{~{}Wm^{-2}K^{-4}}}italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = divide start_ARG 1506.02 roman_W / roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT × 6.95 × 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT roman_m end_ARG start_ARG 3 × 5.67 × 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT roman_Wm start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT roman_K start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT end_ARG T4=1.0466×10121.701×107=6.1528×1018K4T^{4}=\frac{1.0466\times 10^{12}}{1.701\times 10^{-7}}=6.1528\times 10^{18}% \mathrm{~{}K^{4}}italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = divide start_ARG 1.0466 × 10 start_POSTSUPERSCRIPT 12 end_POSTSUPERSCRIPT end_ARG start_ARG 1.701 × 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT end_ARG = 6.1528 × 10 start_POSTSUPERSCRIPT 18 end_POSTSUPERSCRIPT roman_K start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT Solving for TTitalic_T:  求解 TTitalic_T : T=(6.1528×1018)1/4=(6152.8)0.25×1018/4=(6152.8)0.25×104.5T=(6.1528\times 10^{18})^{1/4}=(6152.8)^{0.25}\times 10^{18/4}=(6152.8)^{0.25}% \times 10^{4.5}italic_T = ( 6.1528 × 10 start_POSTSUPERSCRIPT 18 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT = ( 6152.8 ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT × 10 start_POSTSUPERSCRIPT 18 / 4 end_POSTSUPERSCRIPT = ( 6152.8 ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT × 10 start_POSTSUPERSCRIPT 4.5 end_POSTSUPERSCRIPT Using logarithms:  使用对数: log10(6152.8)3.789\log_{10}(6152.8)\approx 3.789roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ( 6152.8 ) ≈ 3.789 0.25×3.789=0.947250.25\times 3.789=0.947250.25 × 3.789 = 0.94725 (6152.8)0.25=100.947258.856(6152.8)^{0.25}=10^{0.94725}\approx 8.856( 6152.8 ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT = 10 start_POSTSUPERSCRIPT 0.94725 end_POSTSUPERSCRIPT ≈ 8.856 T=8.856×104.5=8.856×104×100.5=88560×1088560×3.16=280000KT=8.856\times 10^{4.5}=8.856\times 10^{4}\times 10^{0.5}=88560\times\sqrt{10}% \approx 88560\times 3.16=280000\mathrm{~{}K}italic_T = 8.856 × 10 start_POSTSUPERSCRIPT 4.5 end_POSTSUPERSCRIPT = 8.856 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT × 10 start_POSTSUPERSCRIPT 0.5 end_POSTSUPERSCRIPT = 88560 × square-root start_ARG 10 end_ARG ≈ 88560 × 3.16 = 280000 roman_K Let’s redo the exponent calculation: 18/4=4.518/4=4.518 / 4 = 4.5.
让我们重新进行指数计算: 18/4=4.51844.518/4=4.518 / 4 = 4.5
T=8.856×104.5=8.856×104×100.5T=8.856\times 10^{4.5}=8.856\times 10^{4}\times 10^{0.5}italic_T = 8.856 × 10 start_POSTSUPERSCRIPT 4.5 end_POSTSUPERSCRIPT = 8.856 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT × 10 start_POSTSUPERSCRIPT 0.5 end_POSTSUPERSCRIPT - error in previous step.
T=8.856×104.5=8.856×104×100.58.856superscript104.58.856superscript104superscript100.5T=8.856\times 10^{4.5}=8.856\times 10^{4}\times 10^{0.5}italic_T = 8.856 × 10 start_POSTSUPERSCRIPT 4.5 end_POSTSUPERSCRIPT = 8.856 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT × 10 start_POSTSUPERSCRIPT 0.5 end_POSTSUPERSCRIPT - 上一步错误。
T=(6.1528×1018)0.25=49806KT=(6.1528\times 10^{18})^{0.25}=49806\mathrm{~{}K}italic_T = ( 6.1528 × 10 start_POSTSUPERSCRIPT 18 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT = 49806 roman_K The surface temperature would be approximately 49806K49806\mathrm{~{}K}49806 roman_K, which is closest to 50000K50000\mathrm{~{}K}50000 roman_K.
地表温度大约为 49806K4980649806\mathrm{~{}K}49806 roman_K ,最接近于 50000K5000050000\mathrm{~{}K}50000 roman_K
Final Answer: The final answer is E\boxed{E}italic_E
最终答案:最终答案为 E\boxed{E}italic_E

Appendix E Test-time scaling details
附录 E 测试时间缩放细节

E.1 Sequential scaling ablations
E.1 顺序缩放消融

¡—im_start—¿user  输入文本: ¡—im_start—¿user 翻译: ¡—im_start—¿用户 What is the answer to Life, the Universe and Everything?
生命、宇宙以及万物的答案是什么?
Think for up to 2048 tokens.
思考最多 2048 个标记。
¡—im_start—¿assistant  !—im_start—¿助手
¡—im_start—¿user  输入文本: ¡—im_start—¿user 翻译: ¡—im_start—¿用户 What is the answer to Life, the Universe and Everything?
生命、宇宙以及万物的答案是什么?
Think for up to 64 steps.
思考最多 64 步。
¡—im_start—¿assistant  !—im_start—¿助手
¡—im_start—¿think  !—im_start—?思考 Let me break down this question into the three parts it is asking for: 1) Life 2) Universe 3) Everything
让我将这个问题分解为它所要求的三个部分:1)生命 2)宇宙 3)一切
Let me start with life…
让我从生命开始……
¡—im_start—¿64 steps left
!—im_start—¿剩余 64 步
Let me break down this question into the three parts it is asking for: 1) Life 2) Universe 3) Everything
让我将这个问题分解为它所要求的三个部分:1)生命 2)宇宙 3)一切
¡—im_start—¿63 steps left
剩余 63 步
Let me start with life…
让我从生命开始……
¡—im_start—¿answer  !—im_start—¿answer The answer is…  答案是… ¡—im_start—¿answer  !—im_start—¿answer The answer is…  答案是…
Figure 10: Token and step instruction data formats for controlling test-time compute. We only train our model on the reasoning trace and the answer.
图 10:控制测试时计算的标记和步骤指令数据格式。我们仅在推理轨迹和答案上训练我们的模型。
Table 12: Scaling thinking time via tokens-conditional control. All metrics are averaged over the 30 questions in AIME24.
表 12:通过标记-条件控制扩展思考时间。所有指标均平均于 AIME24 中的 30 个问题。
Tokens instructed (\rightarrow)
指令标记( \rightarrow
1024 2048 4096 8192 16384
No intervention at test-time
测试时无干预
Thinking tokens  思考标记 7939 7158 8263 7108 7500
Answer tokens  答案标记 689 669 659 722 724
AIME24 26.7 30.0 33.3 33.3 40.0
Forcing end of thinking when token budget is reached
强制在达到令牌预算时结束思考
Thinking tokens  思考标记 1024 2048 4031 5664 6330
Answer tokens  答案标记 15 15 142 722 691
AIME24 3.3 30.0 33.3 33.3 40.0
Table 13: Scaling thinking time via step-conditional control. All metrics are averaged over the 30 samples in AIME24. Token counts ignore the thinking and step delimiters.
表 13:通过步骤条件控制扩展思考时间。所有指标均在 AIME24 的 30 个样本上平均。标记计数忽略思考和步骤分隔符。
Steps instructed (\rightarrow)
步骤指令( \rightarrow
16 32 64 128 256
No intervention at test-time
测试时无干预
Steps used  步骤 123 90 80 82 136
Tokens per step  每步标记数 60 70 69 66 56
Thinking tokens  思考标记 7252 6277 5396 5552 7551
Answer tokens  答案标记 665 653 735 777 754
AIME24 33.3 23.3 33.3 36.7 33.3
Forcing end of thinking when 0 steps are reached
强制在达到 0 步时结束思考
Steps used  步骤 16 32 59 78 136
Tokens per step  每步标记数 96 94 80 70 56
Thinking tokens  思考标记 1517 2963 4636 5409 7551
Answer tokens  答案标记 1111 788 799 794 754
AIME24 23.3 23.3 33.3 36.7 33.3
Table 14: Scaling thinking time via class-conditional control. We report “accuracy / average thinking tokens per sample”; the higher the accuracy and the fewer the thinking tokens (inference cost) the better.
表 14:通过类条件控制扩展思考时间。我们报告“准确率/每个样本的平均思考标记数”;准确率越高,思考标记(推理成本)越少,效果越好。
Prompt appended to the question after two newlines
问题后追加提示,并在两行新行之后
AIME24 MATH500 GPQA
Answer after a short amount of thinking. Do not spend excessive time double-checking your work.
思考片刻后给出答案。不要花费过多时间反复检查你的工作。
30.0% / 8033 90.4% / 2537 56.6% / 4177
Answer after a long amount of thinking. If you feel like you are finished early, spend the extra time trying to double-check your work until you are absolutely sure that you have the correct answer.
经过深思熟虑后的答案。如果你觉得你提前完成了,请花额外的时间仔细检查你的工作,直到你完全确信你得到了正确的答案。
36.7% / 9651 91.4% / 3875 51.0% / 4827
Without generic prompt appending
无通用提示附加
50.0% / 6109 93.0% / 3298 57.6% / 3510

Token-conditional control
标记条件控制

One general approach is to simply tell a model in the prompt precisely how many tokens it should generate. Ideally, the model can keep track of its token count and adjust its generation to finish within the desired limits. We experiment with this approach by training a model with token instructions using the format in Figure 10 (left). We bucket the lengths of the reasoning traces from our 1,000 training examples into powers of two (rounded upwards) and add a corresponding instruction to the user prompt. For example, if the instruction says “Think for up to 2048 tokens”, then the reasoning trace has anywhere between 1024 and 2048 tokens. In Table 12, we show that after training the model hardly follows the token instruction. It does sometimes generate more tokens when given a higher limit but often overshoots the limit. This may not be unique to our model as prior work suggests that OpenAI o1-mini can also not follow token instructions (Zhang & Chen, 2024). To prevent exceeding the limit, we test budget forcing the thinking to end once the limit is reached. This leads to perfect control (Table 12 (lower)). With budget forcing, the scaling trend is also clearer as the model can no longer overshoot the limit when given a small thinking budget. This leads to better test-time scaling values for Token Prompting + budget forcing in Table 3. To compute Control reported in Table 3 for token-conditional control variants we divide the number of times the thinking tokens in Table 12 are less than the upper limit by the total evaluations (2/5 for without intervention; 5/5 for with intervention).
一种通用方法是直接在提示中精确告诉模型它应该生成多少个标记。理想情况下,模型可以跟踪其标记计数并调整其生成以在所需限制内完成。我们通过使用图 10(左)中的格式训练一个带有标记指令的模型来实验这种方法。我们将来自 1,000 个训练示例的推理轨迹长度划分为 2 的幂(向上取整),并在用户提示中添加相应的指令。例如,如果指令说“思考最多 2048 个标记”,则推理轨迹的标记数在 1024 到 2048 之间。在表 12 中,我们显示经过训练的模型几乎不遵循标记指令。当给定更高的限制时,它有时会生成更多的标记,但经常超过限制。这可能并不仅限于我们的模型,因为先前的研究表明,OpenAI o1-mini 也无法遵循标记指令(张,陈,2024)。为了防止超过限制,我们测试了预算强制,一旦达到限制,就结束思考。这导致了完美的控制(表 12(下))。 预算强制下,缩放趋势也更加明显,因为当给定小的思考预算时,模型不能再超过限制。这导致表 3 中 Token Prompting + 预算强制在测试时间缩放值方面表现更好。为了计算表 3 中报告的控制变量,我们将表 12 中思考标记少于上限的次数除以总评估次数(无干预为 2/5;干预为 5/5)。

Step-conditional control  步骤条件控制

Token instructions fail as current models cannot count tokens. To accommodate this lack of capability, we experiment with making the counting more coarse-grained. We partition the reasoning traces into steps and ask the model to think for a specific number of steps rather than tokens. We split our reasoning traces on double newlines into steps, which we find act as intuitive separators based on manual inspection of samples. We bucket our training samples into powers of 2 depending on their number of steps and add a corresponding step instruction following the format in Figure 10 (right). This format is based on early experiments, where we found the model to be more likely to adhere to the step limit when counting down (“3 steps left…2 steps left”) rather than counting up (“Step2…Step3…”). This is likely because if counting down, the final step is always 1, which will act as a strong prior to the model to finish its generation. If counting up, the final step before the answer varies, thus if the model does not remember the original step instruction, it may fail to stop. We conclude the following from our results in Table 13: (1) The model still struggles to adhere to the step limit. The model sometimes simply continues counting into negative steps, e.g. “-1 steps left”. To solve this issue, we automatically stop the thinking process once 0 steps are reached and then force the model to transition to answering mode by appending the answer token delimiter (§3). This leads to perfect step adherence (lower half of Table 13), yet problems remain. (2) The model compensates for fewer steps by making each step longer. For example, when forced to use up to 16 steps vs 256 steps, the model generates an average of 96 tokens per step vs 56. Despite this issue, more steps still clearly correlate with more total thinking tokens in Table 13 and better performance leading to a positive slope (3) Step instructions are more costly than other methods. The step delimiters require around 6 tokens each which for e.g. 64 steps adds up to a total of around 380 tokens. When ignoring the step delimiters in token counts as in Table 13, the model still requires 7551 thinking tokens on average to achieve only 33.3% on AIME24. To compute Control reported in Table 3 for step-conditional control variants, we first decide that 100 tokens are an upper limit per step and then multiply this number by the steps instructed to arrive at a proxy total token limit, e.g. 1600 for 16 steps instructed. We then check whether the thinking tokens in Table 13 fit within the respective limit for each evaluation run (3/5 for without intervention; 5/5 for with intervention). For the model in Figure 7, we use a model with step-conditional control trained on an earlier version of our data and using an earlier version of our evaluation codebase.
令牌指令失败,因为当前模型无法计数令牌。为了适应这种能力不足,我们尝试使计数更加粗粒度。我们将推理轨迹分为步骤,并要求模型思考特定数量的步骤而不是令牌。我们将推理轨迹在双换行符处分割成步骤,我们发现这些步骤在手动检查样本的基础上充当直观的分隔符。根据步骤数量将我们的训练样本分为 2 的幂次方,并添加相应的步骤指令,格式如图 10(右)所示。这种格式基于早期实验,我们发现模型在倒计时(“还剩 3 步…还剩 2 步”)时比在计数时(“第二步…第三步…”)更有可能遵守步骤限制。这可能是由于如果倒计时,最后一步总是 1,这将对模型完成其生成起到强烈的先验作用。如果计数,答案之前的最后一步可能不同,因此如果模型没有记住原始步骤指令,它可能无法停止。我们从表 13 的结果中得出以下结论:(1)模型仍然难以遵守步骤限制。 模型有时简单地继续计数到负步数,例如“-1 步剩余”。为了解决这个问题,我们一旦达到 0 步就自动停止思考过程,并通过添加答案标记分隔符(§3)强制模型过渡到回答模式。这导致完美的步数遵守(表 13 的下半部分),但问题仍然存在。(2)模型通过使每步更长来补偿较少的步数。例如,当被迫使用最多 16 步而不是 256 步时,模型每步生成平均 96 个标记,而 56 个。尽管存在这个问题,更多的步数仍然与表 13 中更多的总思考标记和更好的性能相关,导致正斜率(3)步骤指令比其他方法更昂贵。步骤分隔符需要大约 6 个标记,例如 64 步的总计约为 380 个标记。在忽略表 13 中的标记计数中的步骤分隔符时,模型仍然需要平均 7551 个思考标记才能在 AIME24 上仅达到 33.3%。 为了计算表 3 中报告的步条件控制变体的控制,我们首先决定每步的上限为 100 个标记,然后将这个数字乘以指示的步数,以得到代理总标记限制,例如 16 步指示的 1600。然后,我们检查表 13 中的思维标记是否适合每个评估运行(无干预为 3/5;干预为 5/5)。对于图 7 中的模型,我们使用在较早版本的数据上训练的步条件控制模型,并使用较早版本的评估代码库。

Class-conditional control
类条件控制

OpenAI exposes test-time compute control to users via a “reasoning_effort” API parameter with three possible settings: low, medium, and high.333https://github.com/openai/openai-python/blob/44d6210f101abedeb2dd68507fcffcb329df70ea/src/openai/types/chat/completion_create_params.py#L172 The OpenAI documentation also states that “Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.” suggesting that they are unable to control test-time compute with guarantees. Thus, maybe OpenAI simply adjusts the prompt or system instruction depending on the reasoning effort desired. In Table 14, we show that separate prompts for short and long thinking allow us to control thinking time to some extent: Prompting the model to think for longer leads to longer thinking. However, it does not reliably improve performance and control is not precise. The current adherence to control may suffice when we only have three classes, but it might not scale to finer-grained classes. To compute Control reported in Table 3 for this method, we assume that prompting the model to think for a short time in Table 14 should produce fewer tokens than the default for AIME24, while the long prompt should produce more. As 8033>61098033>61098033 > 6109 and 9651>61099651>61099651 > 6109, one out of two follows our expected control thus Control is 50%.
OpenAI 通过“reasoning_effort”API 参数将测试时计算控制暴露给用户,有三个可能的设置:低、中、高。OpenAI 文档还指出,“减少推理努力可以导致响应更快,并且在响应中用于推理的令牌更少”,这表明他们无法保证控制测试时的计算。因此,也许 OpenAI 只是根据所需的推理努力调整提示或系统指令。在表 14 中,我们展示了为短思考和长思考设置单独的提示,使我们能够在一定程度上控制思考时间:提示模型进行更长时间的思考会导致更长时间的思考。然而,这并不可靠地提高性能,控制并不精确。当我们只有三个类别时,当前的坚持控制可能足够,但它可能无法扩展到更细粒度的类别。为了计算表 3 中此方法的 Control,我们假设在表 14 中提示模型进行短时间的思考应该比 AIME24 的默认值产生更少的令牌,而长提示应该产生更多。正如 8033>6109803361098033>61098033 > 61099651>6109965161099651>61099651 > 6109 所示,两个中有一个符合我们的预期控制,因此 Control 为 50%。

E.2 Examples for rejection sampling ablation
E.2 拒绝采样消融的示例

Table 15: Answers generated by an earlier version of s1-32B when rejection sampling on AIME24. The black text is the question and the yellow text is the solution. Based solely on the question we generate the light blue reasoning trace and blue response with s1-32B using a temperature of 1. We generate until the reasoning trace has less than 4000 tokens for the first and less than 8000 tokens for the second example. It required on average 199 (1) tries per example until thinking was 4000\leq 4000≤ 4000 (8000\leq 8000≤ 8000 tokens).
表 15:s1-32B 早期版本在 AIME24 上的拒绝采样生成的答案。黑色文字是问题,黄色文字是解答。仅根据问题,我们使用温度 1 的 s1-32B 生成浅蓝色推理轨迹和蓝色回答。我们生成,直到推理轨迹的第一个例子少于 4000 个标记,第二个例子少于 8000 个标记。平均每个例子需要 199 次(1)尝试,直到思考达到 4000absent4000\leq 4000≤ 40008000absent8000\leq 8000≤ 8000 标记)。
Let O(0,0),A(12,0),O(0,0),A(\tfrac{1}{2},0),italic_O ( 0 , 0 ) , italic_A ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 0 ) , and B(0,32)B(0,\tfrac{\sqrt{3}}{2})italic_B ( 0 , divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG ) be points in the coordinate plane. Let \mathcal{F}caligraphic_F be the family of segments PQ¯\overline{PQ}over¯ start_ARG italic_P italic_Q end_ARG of unit length lying in the first quadrant with PPitalic_P on the xxitalic_x-axis and QQitalic_Q on the yyitalic_y-axis. There is a unique point CCitalic_C on AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG, distinct from AAitalic_A and BBitalic_B, that does not belong to any segment from \mathcal{F}caligraphic_F other than AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG. Then OC2=pqOC^{2}=\tfrac{p}{q}italic_O italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG italic_p end_ARG start_ARG italic_q end_ARG, where ppitalic_p and qqitalic_q are relatively prime positive integers. Find p+qp+qitalic_p + italic_q.
O(0,0),A(12,0),00120O(0,0),A(\tfrac{1}{2},0),italic_O ( 0 , 0 ) , italic_A ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 0 ) ,B(0,32)032B(0,\tfrac{\sqrt{3}}{2})italic_B ( 0 , divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG ) 为坐标平面上的点。设 \mathcal{F}caligraphic_F 为第一象限内长度为单位的线段 PQ¯\overline{PQ}over¯ start_ARG italic_P italic_Q end_ARG 的集合,其中 PPitalic_P 位于 xxitalic_x 轴上, QQitalic_Q 位于 yyitalic_y 轴上。存在一个唯一的点 CCitalic_CAB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 上,与 AAitalic_ABBitalic_B 不同,且不属于除 \mathcal{F}caligraphic_F 以外的任何 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 线段。那么 OC2=pqsuperscript2OC^{2}=\tfrac{p}{q}italic_O italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG italic_p end_ARG start_ARG italic_q end_ARG ,其中 ppitalic_pqqitalic_q 是互质的正整数。求 p+qp+qitalic_p + italic_q
23
[Correct generation with 3981 thinking tokens and 1055 answer tokens]
[生成正确,包含 3981 个思考令牌和 1055 个回答令牌]
The problem asks for a point CCitalic_C on the segment AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG that is not contained in any other segment from the family \mathcal{F}caligraphic_F, except for AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG itself.
问题要求在段 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 上找到一个点 CCitalic_C ,该点不属于家族 \mathcal{F}caligraphic_F 中的任何其他段,除了 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 本身。
The points O,A,BO,A,Bitalic_O , italic_A , italic_B form a triangle in the first quadrant.
O,A,BO,A,Bitalic_O , italic_A , italic_B 在第一象限形成一个三角形。
O=(0,0)O=(0,0)italic_O = ( 0 , 0 ) A=(12,0)A=(\tfrac{1}{2},0)italic_A = ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 0 ) B=(0,32)B=(0,\tfrac{\sqrt{3}}{2})italic_B = ( 0 , divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG ) The segment AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG connects A(12,0)A(\tfrac{1}{2},0)italic_A ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 0 ) and B(0,32)B(0,\tfrac{\sqrt{3}}{2})italic_B ( 0 , divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG ).
该段 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 连接 A(12,0)120A(\tfrac{1}{2},0)italic_A ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 0 )B(0,32)032B(0,\tfrac{\sqrt{3}}{2})italic_B ( 0 , divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG )
The equation of the line containing AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG can be found using the two points:
直线的方程可以通过这两个点求得:
Slope m=320012=3m=\frac{\frac{\sqrt{3}}{2}-0}{0-\frac{1}{2}}=-\sqrt{3}italic_m = divide start_ARG divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG - 0 end_ARG start_ARG 0 - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_ARG = - square-root start_ARG 3 end_ARG  斜率 m=320012=33200123m=\frac{\frac{\sqrt{3}}{2}-0}{0-\frac{1}{2}}=-\sqrt{3}italic_m = divide start_ARG divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG - 0 end_ARG start_ARG 0 - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_ARG = - square-root start_ARG 3 end_ARG Using the point-slope form yy1=m(xx1)y-y_{1}=m(x-x_{1})italic_y - italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_m ( italic_x - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) with point A(12,0)A(\tfrac{1}{2},0)italic_A ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 0 ):
使用点斜式方程 yy1=m(xx1)subscript1subscript1y-y_{1}=m(x-x_{1})italic_y - italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_m ( italic_x - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ,以点 A(12,0)120A(\tfrac{1}{2},0)italic_A ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 0 ) 为例:
y0=3(x12)y-0=-\sqrt{3}(x-\tfrac{1}{2})italic_y - 0 = - square-root start_ARG 3 end_ARG ( italic_x - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) y=3x+32y=-\sqrt{3}x+\frac{\sqrt{3}}{2}italic_y = - square-root start_ARG 3 end_ARG italic_x + divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG The segment AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG lies in the first quadrant and connects (12,0)(\tfrac{1}{2},0)( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 0 ) to (0,32)(0,\tfrac{\sqrt{3}}{2})( 0 , divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG ).
该段 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 位于第一象限,连接 (12,0)120(\tfrac{1}{2},0)( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 0 )(0,32)032(0,\tfrac{\sqrt{3}}{2})( 0 , divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG )
The family \mathcal{F}caligraphic_F consists of segments PQ¯\overline{PQ}over¯ start_ARG italic_P italic_Q end_ARG of unit length, where PPitalic_P is on the xxitalic_x-axis and QQitalic_Q is on the yyitalic_y-axis. Let P=(a,0)P=(a,0)italic_P = ( italic_a , 0 ) and Q=(0,b)Q=(0,b)italic_Q = ( 0 , italic_b ).
该家族 \mathcal{F}caligraphic_F 由长度为 PQ¯\overline{PQ}over¯ start_ARG italic_P italic_Q end_ARG 的段组成,其中 PPitalic_P 位于 xxitalic_x 轴上, QQitalic_Q 位于 yyitalic_y 轴上。设 P=(a,0)0P=(a,0)italic_P = ( italic_a , 0 )Q=(0,b)0Q=(0,b)italic_Q = ( 0 , italic_b )
The length of PQ¯\overline{PQ}over¯ start_ARG italic_P italic_Q end_ARG is (a0)2+(0b)2=a2+b2=1\sqrt{(a-0)^{2}+(0-b)^{2}}=\sqrt{a^{2}+b^{2}}=1square-root start_ARG ( italic_a - 0 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 0 - italic_b ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = square-root start_ARG italic_a start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_b start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = 1.
PQ¯\overline{PQ}over¯ start_ARG italic_P italic_Q end_ARG 的长度为 (a0)2+(0b)2=a2+b2=1superscript02superscript02superscript2superscript21\sqrt{(a-0)^{2}+(0-b)^{2}}=\sqrt{a^{2}+b^{2}}=1square-root start_ARG ( italic_a - 0 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 0 - italic_b ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = square-root start_ARG italic_a start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_b start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = 1
So a2+b2=1a^{2}+b^{2}=1italic_a start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_b start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1.  所以 a2+b2=1superscript2superscript21a^{2}+b^{2}=1italic_a start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_b start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1 . Since PPitalic_P is on the positive xxitalic_x-axis and QQitalic_Q is on the positive yyitalic_y-axis, we have a>0a>0italic_a > 0 and b>0b>0italic_b > 0.
由于 PPitalic_P 位于正 xxitalic_x 轴上,而 QQitalic_Q 位于正 yyitalic_y 轴上,因此我们有 a>00a>0italic_a > 0b>00b>0italic_b > 0
The segment PQ¯\overline{PQ}over¯ start_ARG italic_P italic_Q end_ARG connects (a,0)(a,0)( italic_a , 0 ) and (0,b)(0,b)( 0 , italic_b ). The equation of the line containing PQ¯\overline{PQ}over¯ start_ARG italic_P italic_Q end_ARG is xa+yb=1\frac{x}{a}+\frac{y}{b}=1divide start_ARG italic_x end_ARG start_ARG italic_a end_ARG + divide start_ARG italic_y end_ARG start_ARG italic_b end_ARG = 1.
该段 PQ¯\overline{PQ}over¯ start_ARG italic_P italic_Q end_ARG 连接 (a,0)0(a,0)( italic_a , 0 )(0,b)0(0,b)( 0 , italic_b ) 。包含 PQ¯\overline{PQ}over¯ start_ARG italic_P italic_Q end_ARG 的直线方程为 xa+yb=11\frac{x}{a}+\frac{y}{b}=1divide start_ARG italic_x end_ARG start_ARG italic_a end_ARG + divide start_ARG italic_y end_ARG start_ARG italic_b end_ARG = 1
The region covered by the family \mathcal{F}caligraphic_F is the area bounded by the lines containing these segments. The endpoints (a,0)(a,0)( italic_a , 0 ) and (0,b)(0,b)( 0 , italic_b ) vary such that a2+b2=1a^{2}+b^{2}=1italic_a start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_b start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1.
该家族 \mathcal{F}caligraphic_F 所覆盖的区域是由包含这些线段的直线所界定的区域。端点 (a,0)0(a,0)( italic_a , 0 )(0,b)0(0,b)( 0 , italic_b ) 的变化使得 a2+b2=1superscript2superscript21a^{2}+b^{2}=1italic_a start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_b start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1
Consider the parametrization a=cosθa=\cos\thetaitalic_a = roman_cos italic_θ, b=sinθb=\sin\thetaitalic_b = roman_sin italic_θ, where 0<θ<π20<\theta<\frac{\pi}{2}0 < italic_θ < divide start_ARG italic_π end_ARG start_ARG 2 end_ARG.
考虑参数化 a=cosθa=\cos\thetaitalic_a = roman_cos italic_θb=sinθb=\sin\thetaitalic_b = roman_sin italic_θ ,其中 0<θ<π2020<\theta<\frac{\pi}{2}0 < italic_θ < divide start_ARG italic_π end_ARG start_ARG 2 end_ARG
The segment PQ¯\overline{PQ}over¯ start_ARG italic_P italic_Q end_ARG connects (cosθ,0)(\cos\theta,0)( roman_cos italic_θ , 0 ) and (0,sinθ)(0,\sin\theta)( 0 , roman_sin italic_θ ). The equation of the line is xcosθ+ysinθ=1\frac{x}{\cos\theta}+\frac{y}{\sin\theta}=1divide start_ARG italic_x end_ARG start_ARG roman_cos italic_θ end_ARG + divide start_ARG italic_y end_ARG start_ARG roman_sin italic_θ end_ARG = 1.
该段 PQ¯\overline{PQ}over¯ start_ARG italic_P italic_Q end_ARG 连接 (cosθ,0)0(\cos\theta,0)( roman_cos italic_θ , 0 )(0,sinθ)0(0,\sin\theta)( 0 , roman_sin italic_θ ) 。直线的方程为 xcosθ+ysinθ=11\frac{x}{\cos\theta}+\frac{y}{\sin\theta}=1divide start_ARG italic_x end_ARG start_ARG roman_cos italic_θ end_ARG + divide start_ARG italic_y end_ARG start_ARG roman_sin italic_θ end_ARG = 1
We are looking for a unique point CCitalic_C on AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG, distinct from AAitalic_A and BBitalic_B, that does not belong to any segment from \mathcal{F}caligraphic_F other than AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG.
我们正在寻找一个独特的点 CCitalic_C ,该点位于 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 上,与 AAitalic_ABBitalic_B 不同,且不属于除了 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 之外的 \mathcal{F}caligraphic_F 中的任何线段。
Let CCitalic_C be a point on AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG. We can parametrize the points on AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG as (1t)A+tB(1-t)A+tB( 1 - italic_t ) italic_A + italic_t italic_B for 0t10\leq t\leq 10 ≤ italic_t ≤ 1.
CCitalic_CAB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 上的一个点。我们可以将 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 上的点参数化为 (1t)A+tB1(1-t)A+tB( 1 - italic_t ) italic_A + italic_t italic_B ,对于 0t1010\leq t\leq 10 ≤ italic_t ≤ 1
C=(1t)(12,0)+t(0,32)=(1212t,32t)C=(1-t)(\tfrac{1}{2},0)+t(0,\tfrac{\sqrt{3}}{2})=(\tfrac{1}{2}-\tfrac{1}{2}t,% \tfrac{\sqrt{3}}{2}t)italic_C = ( 1 - italic_t ) ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 0 ) + italic_t ( 0 , divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG ) = ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_t , divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG italic_t ). Since CCitalic_C is distinct from AAitalic_A and BBitalic_B, we have 0<t<10<t<10 < italic_t < 1.
由于 CCitalic_CAAitalic_ABBitalic_B 不同,因此我们有 0<t<1010<t<10 < italic_t < 1
Suppose the point CCitalic_C lies on some segment PQ¯\overline{PQ}\in\mathcal{F}over¯ start_ARG italic_P italic_Q end_ARG ∈ caligraphic_F, where P=(cosθ,0)P=(\cos\theta,0)italic_P = ( roman_cos italic_θ , 0 ) and Q=(0,sinθ)Q=(0,\sin\theta)italic_Q = ( 0 , roman_sin italic_θ ).
假设点 CCitalic_C 位于某条线段 PQ¯\overline{PQ}\in\mathcal{F}over¯ start_ARG italic_P italic_Q end_ARG ∈ caligraphic_F 上,其中 P=(cosθ,0)0P=(\cos\theta,0)italic_P = ( roman_cos italic_θ , 0 )Q=(0,sinθ)0Q=(0,\sin\theta)italic_Q = ( 0 , roman_sin italic_θ )
The point CCitalic_C satisfies the equation xcosθ+ysinθ=1\frac{x}{\cos\theta}+\frac{y}{\sin\theta}=1divide start_ARG italic_x end_ARG start_ARG roman_cos italic_θ end_ARG + divide start_ARG italic_y end_ARG start_ARG roman_sin italic_θ end_ARG = 1.
该点 CCitalic_C 满足方程 xcosθ+ysinθ=11\frac{x}{\cos\theta}+\frac{y}{\sin\theta}=1divide start_ARG italic_x end_ARG start_ARG roman_cos italic_θ end_ARG + divide start_ARG italic_y end_ARG start_ARG roman_sin italic_θ end_ARG = 1
Plugging in the coordinates of CCitalic_C:
CCitalic_C 的坐标插入:
1212tcosθ+32tsinθ=1\frac{\frac{1}{2}-\frac{1}{2}t}{\cos\theta}+\frac{\frac{\sqrt{3}}{2}t}{\sin% \theta}=1divide start_ARG divide start_ARG 1 end_ARG start_ARG 2 end_ARG - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_t end_ARG start_ARG roman_cos italic_θ end_ARG + divide start_ARG divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG italic_t end_ARG start_ARG roman_sin italic_θ end_ARG = 1 1t2cosθ+3t2sinθ=1\frac{1-t}{2\cos\theta}+\frac{\sqrt{3}t}{2\sin\theta}=1divide start_ARG 1 - italic_t end_ARG start_ARG 2 roman_cos italic_θ end_ARG + divide start_ARG square-root start_ARG 3 end_ARG italic_t end_ARG start_ARG 2 roman_sin italic_θ end_ARG = 1 Multiply by 2:  乘以 2: 1tcosθ+3tsinθ=2\frac{1-t}{\cos\theta}+\frac{\sqrt{3}t}{\sin\theta}=2divide start_ARG 1 - italic_t end_ARG start_ARG roman_cos italic_θ end_ARG + divide start_ARG square-root start_ARG 3 end_ARG italic_t end_ARG start_ARG roman_sin italic_θ end_ARG = 2 (1t)sinθ+3tcosθ=2sinθcosθ(1-t)\sin\theta+\sqrt{3}t\cos\theta=2\sin\theta\cos\theta( 1 - italic_t ) roman_sin italic_θ + square-root start_ARG 3 end_ARG italic_t roman_cos italic_θ = 2 roman_sin italic_θ roman_cos italic_θ sinθtsinθ+3tcosθ=sin(2θ)/2\sin\theta-t\sin\theta+\sqrt{3}t\cos\theta=\sin(2\theta)/2roman_sin italic_θ - italic_t roman_sin italic_θ + square-root start_ARG 3 end_ARG italic_t roman_cos italic_θ = roman_sin ( 2 italic_θ ) / 2 We are looking for a point CCitalic_C on AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG such that the only segment from \mathcal{F}caligraphic_F containing CCitalic_C is AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG itself.
我们正在寻找一个点 CCitalic_CAB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 上,使得仅包含 CCitalic_C 的段 \mathcal{F}caligraphic_F 就是 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 本身。
The segment AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG is part of the family \mathcal{F}caligraphic_F.
该段 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 属于家族 \mathcal{F}caligraphic_F 的一部分。
For AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG, P=A=(12,0)P=A=(\tfrac{1}{2},0)italic_P = italic_A = ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 0 ) and Q=B=(0,32)Q=B=(0,\tfrac{\sqrt{3}}{2})italic_Q = italic_B = ( 0 , divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG ).
对于 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARGP=A=(12,0)120P=A=(\tfrac{1}{2},0)italic_P = italic_A = ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 0 )Q=B=(0,32)032Q=B=(0,\tfrac{\sqrt{3}}{2})italic_Q = italic_B = ( 0 , divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG )
Here a=12a=\tfrac{1}{2}italic_a = divide start_ARG 1 end_ARG start_ARG 2 end_ARG, b=32b=\tfrac{\sqrt{3}}{2}italic_b = divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG.
这里 a=1212a=\tfrac{1}{2}italic_a = divide start_ARG 1 end_ARG start_ARG 2 end_ARGb=3232b=\tfrac{\sqrt{3}}{2}italic_b = divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG
a2+b2=(12)2+(32)2=14+34=1a^{2}+b^{2}=(\tfrac{1}{2})^{2}+(\tfrac{\sqrt{3}}{2})^{2}=\tfrac{1}{4}+\tfrac{3% }{4}=1italic_a start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_b start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG 4 end_ARG + divide start_ARG 3 end_ARG start_ARG 4 end_ARG = 1. The line segment AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG connects (12,0)(\tfrac{1}{2},0)( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 0 ) and (0,32)(0,\tfrac{\sqrt{3}}{2})( 0 , divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG ).
线段 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 连接 (12,0)120(\tfrac{1}{2},0)( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 0 )(0,32)032(0,\tfrac{\sqrt{3}}{2})( 0 , divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG )
The equation of the line is x1/2+y3/2=1\frac{x}{1/2}+\frac{y}{\sqrt{3}/2}=1divide start_ARG italic_x end_ARG start_ARG 1 / 2 end_ARG + divide start_ARG italic_y end_ARG start_ARG square-root start_ARG 3 end_ARG / 2 end_ARG = 1, which simplifies to 2x+23y=12x+\frac{2}{\sqrt{3}}y=12 italic_x + divide start_ARG 2 end_ARG start_ARG square-root start_ARG 3 end_ARG end_ARG italic_y = 1, or 3x+y=32\sqrt{3}x+y=\frac{\sqrt{3}}{2}square-root start_ARG 3 end_ARG italic_x + italic_y = divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG.
直线的方程为 x1/2+y3/2=112321\frac{x}{1/2}+\frac{y}{\sqrt{3}/2}=1divide start_ARG italic_x end_ARG start_ARG 1 / 2 end_ARG + divide start_ARG italic_y end_ARG start_ARG square-root start_ARG 3 end_ARG / 2 end_ARG = 1 ,简化后为 2x+23y=122312x+\frac{2}{\sqrt{3}}y=12 italic_x + divide start_ARG 2 end_ARG start_ARG square-root start_ARG 3 end_ARG end_ARG italic_y = 1 ,或 3x+y=32332\sqrt{3}x+y=\frac{\sqrt{3}}{2}square-root start_ARG 3 end_ARG italic_x + italic_y = divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG
This matches the equation y=3x+32y=-\sqrt{3}x+\frac{\sqrt{3}}{2}italic_y = - square-root start_ARG 3 end_ARG italic_x + divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG.
这与方程式 y=3x+32332y=-\sqrt{3}x+\frac{\sqrt{3}}{2}italic_y = - square-root start_ARG 3 end_ARG italic_x + divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG 相匹配。
Consider the envelope of the family of lines xcosθ+ysinθ=1\frac{x}{\cos\theta}+\frac{y}{\sin\theta}=1divide start_ARG italic_x end_ARG start_ARG roman_cos italic_θ end_ARG + divide start_ARG italic_y end_ARG start_ARG roman_sin italic_θ end_ARG = 1.
考虑直线族 xcosθ+ysinθ=11\frac{x}{\cos\theta}+\frac{y}{\sin\theta}=1divide start_ARG italic_x end_ARG start_ARG roman_cos italic_θ end_ARG + divide start_ARG italic_y end_ARG start_ARG roman_sin italic_θ end_ARG = 1 的包络。
Differentiating with respect to θ\thetaitalic_θ:
θ\thetaitalic_θ 求导
xsinθcos2θ+ycosθsin2θ=0-\frac{x\sin\theta}{\cos^{2}\theta}+\frac{y\cos\theta}{\sin^{2}\theta}=0- divide start_ARG italic_x roman_sin italic_θ end_ARG start_ARG roman_cos start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_θ end_ARG + divide start_ARG italic_y roman_cos italic_θ end_ARG start_ARG roman_sin start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_θ end_ARG = 0 ycosθsin2θ=xsinθcos2θ\frac{y\cos\theta}{\sin^{2}\theta}=\frac{x\sin\theta}{\cos^{2}\theta}divide start_ARG italic_y roman_cos italic_θ end_ARG start_ARG roman_sin start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_θ end_ARG = divide start_ARG italic_x roman_sin italic_θ end_ARG start_ARG roman_cos start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_θ end_ARG ycos3θ=xsin3θy\cos^{3}\theta=x\sin^{3}\thetaitalic_y roman_cos start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_θ = italic_x roman_sin start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_θ yx=tan3θ\frac{y}{x}=\tan^{3}\thetadivide start_ARG italic_y end_ARG start_ARG italic_x end_ARG = roman_tan start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_θ tanθ=(yx)1/3\tan\theta=(\frac{y}{x})^{1/3}roman_tan italic_θ = ( divide start_ARG italic_y end_ARG start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT From the line equation: ysinθ=sinθcosθxsinθy\sin\theta=\sin\theta\cos\theta-x\sin\thetaitalic_y roman_sin italic_θ = roman_sin italic_θ roman_cos italic_θ - italic_x roman_sin italic_θ
从直线方程: ysinθ=sinθcosθxsinθy\sin\theta=\sin\theta\cos\theta-x\sin\thetaitalic_y roman_sin italic_θ = roman_sin italic_θ roman_cos italic_θ - italic_x roman_sin italic_θ
y=cosθxy=\cos\theta-xitalic_y = roman_cos italic_θ - italic_x Substitute cosθ=11+tan2θ=11+(y/x)2/3\cos\theta=\frac{1}{\sqrt{1+\tan^{2}\theta}}=\frac{1}{\sqrt{1+(y/x)^{2/3}}}roman_cos italic_θ = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 1 + roman_tan start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_θ end_ARG end_ARG = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 1 + ( italic_y / italic_x ) start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT end_ARG end_ARG  替代 cosθ=11+tan2θ=11+(y/x)2/311superscript211superscript23\cos\theta=\frac{1}{\sqrt{1+\tan^{2}\theta}}=\frac{1}{\sqrt{1+(y/x)^{2/3}}}roman_cos italic_θ = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 1 + roman_tan start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_θ end_ARG end_ARG = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 1 + ( italic_y / italic_x ) start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT end_ARG end_ARG cosθ=x1/3(x2/3+y2/3)1/2\cos\theta=\frac{x^{1/3}}{(x^{2/3}+y^{2/3})^{1/2}}roman_cos italic_θ = divide start_ARG italic_x start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT end_ARG start_ARG ( italic_x start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_y start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT end_ARG sinθ=tanθcosθ=(yx)1/3x1/3(x2/3+y2/3)1/2=y1/3(x2/3+y2/3)1/2\sin\theta=\tan\theta\cos\theta=(\frac{y}{x})^{1/3}\frac{x^{1/3}}{(x^{2/3}+y^{% 2/3})^{1/2}}=\frac{y^{1/3}}{(x^{2/3}+y^{2/3})^{1/2}}roman_sin italic_θ = roman_tan italic_θ roman_cos italic_θ = ( divide start_ARG italic_y end_ARG start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT end_ARG start_ARG ( italic_x start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_y start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG italic_y start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT end_ARG start_ARG ( italic_x start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_y start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT end_ARG Substitute back into the line equation:
将代数式代入直线方程:
xcosθ+ysinθ=1\frac{x}{\cos\theta}+\frac{y}{\sin\theta}=1divide start_ARG italic_x end_ARG start_ARG roman_cos italic_θ end_ARG + divide start_ARG italic_y end_ARG start_ARG roman_sin italic_θ end_ARG = 1 x(x2/3+y2/3)1/2/x1/3+y(x2/3+y2/3)1/2/y1/3=1x(x^{2/3}+y^{2/3})^{1/2}/x^{1/3}+y(x^{2/3}+y^{2/3})^{1/2}/y^{1/3}=1italic_x ( italic_x start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_y start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT / italic_x start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT + italic_y ( italic_x start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_y start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT / italic_y start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT = 1 x2/3(x2/3+y2/3)1/2+y2/3(x2/3+y2/3)1/2=1x^{2/3}(x^{2/3}+y^{2/3})^{1/2}+y^{2/3}(x^{2/3}+y^{2/3})^{1/2}=1italic_x start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_y start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_y start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_y start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT = 1 (x2/3+y2/3)3/2=1(x^{2/3}+y^{2/3})^{3/2}=1( italic_x start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_y start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT = 1 x2/3+y2/3=1x^{2/3}+y^{2/3}=1italic_x start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_y start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT = 1 The envelope of the family \mathcal{F}caligraphic_F is the astroid x2/3+y2/3=1x^{2/3}+y^{2/3}=1italic_x start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_y start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT = 1.
该家族的包络是星形线。
The segments from \mathcal{F}caligraphic_F are tangent to this astroid.
\mathcal{F}caligraphic_F 的段与此心形线相切。
The point CCitalic_C on AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG that we are looking for is the point where AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG is tangent to the astroid.
我们要找的点 CCitalic_CAB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 上,是 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 与心形线相切的那一点。
The equation of AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG is y=3x+32y=-\sqrt{3}x+\frac{\sqrt{3}}{2}italic_y = - square-root start_ARG 3 end_ARG italic_x + divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG.
该方程为 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG ,即 y=3x+32332y=-\sqrt{3}x+\frac{\sqrt{3}}{2}italic_y = - square-root start_ARG 3 end_ARG italic_x + divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG
Substitute into the astroid equation:
代入到双曲线方程中:
x2/3+(3x+32)2/3=1x^{2/3}+(-\sqrt{3}x+\frac{\sqrt{3}}{2})^{2/3}=1italic_x start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + ( - square-root start_ARG 3 end_ARG italic_x + divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT = 1 Let u=x1/3u=x^{1/3}italic_u = italic_x start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT and v=y1/3v=y^{1/3}italic_v = italic_y start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT. The astroid is u2+v2=1u^{2}+v^{2}=1italic_u start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_v start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1.
u=x1/3superscript13u=x^{1/3}italic_u = italic_x start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPTv=y1/3superscript13v=y^{1/3}italic_v = italic_y start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT 。彗星是 u2+v2=1superscript2superscript21u^{2}+v^{2}=1italic_u start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_v start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1
The point CCitalic_C lies on both the line AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG and the astroid.
CCitalic_C 位于直线 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 和抛物线形星形线(asteroid)上。
Differentiating the astroid equation implicitly:
隐式区分双曲线方程
23x1/3+23y1/3y=0\frac{2}{3}x^{-1/3}+\frac{2}{3}y^{-1/3}y^{\prime}=0divide start_ARG 2 end_ARG start_ARG 3 end_ARG italic_x start_POSTSUPERSCRIPT - 1 / 3 end_POSTSUPERSCRIPT + divide start_ARG 2 end_ARG start_ARG 3 end_ARG italic_y start_POSTSUPERSCRIPT - 1 / 3 end_POSTSUPERSCRIPT italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 0 y=y1/3x1/3=(yx)1/3y^{\prime}=-\frac{y^{1/3}}{x^{1/3}}=-(\frac{y}{x})^{1/3}italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = - divide start_ARG italic_y start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_x start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT end_ARG = - ( divide start_ARG italic_y end_ARG start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT
The slope of the tangent to the astroid at a point (x,y)(x,y)( italic_x , italic_y ) is yx3-\sqrt[3]{\frac{y}{x}}- nth-root start_ARG 3 end_ARG start_ARG divide start_ARG italic_y end_ARG start_ARG italic_x end_ARG end_ARG.
切线在点 (x,y)(x,y)( italic_x , italic_y ) 处对心形线的斜率为 yx33-\sqrt[3]{\frac{y}{x}}- nth-root start_ARG 3 end_ARG start_ARG divide start_ARG italic_y end_ARG start_ARG italic_x end_ARG end_ARG
The slope of the line AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG is 3-\sqrt{3}- square-root start_ARG 3 end_ARG.
直线的斜率 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG33-\sqrt{3}- square-root start_ARG 3 end_ARG
So, at the point of tangency, yx3=3-\sqrt[3]{\frac{y}{x}}=-\sqrt{3}- nth-root start_ARG 3 end_ARG start_ARG divide start_ARG italic_y end_ARG start_ARG italic_x end_ARG end_ARG = - square-root start_ARG 3 end_ARG.
因此,在切点处, yx3=333-\sqrt[3]{\frac{y}{x}}=-\sqrt{3}- nth-root start_ARG 3 end_ARG start_ARG divide start_ARG italic_y end_ARG start_ARG italic_x end_ARG end_ARG = - square-root start_ARG 3 end_ARG
yx3=3\sqrt[3]{\frac{y}{x}}=\sqrt{3}nth-root start_ARG 3 end_ARG start_ARG divide start_ARG italic_y end_ARG start_ARG italic_x end_ARG end_ARG = square-root start_ARG 3 end_ARG yx=(3)3=33\frac{y}{x}=(\sqrt{3})^{3}=3\sqrt{3}divide start_ARG italic_y end_ARG start_ARG italic_x end_ARG = ( square-root start_ARG 3 end_ARG ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT = 3 square-root start_ARG 3 end_ARG The point CCitalic_C lies on the line y=3x+32y=-\sqrt{3}x+\frac{\sqrt{3}}{2}italic_y = - square-root start_ARG 3 end_ARG italic_x + divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG and satisfies yx=33\frac{y}{x}=3\sqrt{3}divide start_ARG italic_y end_ARG start_ARG italic_x end_ARG = 3 square-root start_ARG 3 end_ARG.
CCitalic_C 位于直线 y=3x+32332y=-\sqrt{3}x+\frac{\sqrt{3}}{2}italic_y = - square-root start_ARG 3 end_ARG italic_x + divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG 上,并满足 yx=3333\frac{y}{x}=3\sqrt{3}divide start_ARG italic_y end_ARG start_ARG italic_x end_ARG = 3 square-root start_ARG 3 end_ARG
So y=33xy=3\sqrt{3}xitalic_y = 3 square-root start_ARG 3 end_ARG italic_x.  所以 y=33x33y=3\sqrt{3}xitalic_y = 3 square-root start_ARG 3 end_ARG italic_x . Substitute into the line equation:
代入直线方程:
33x=3x+323\sqrt{3}x=-\sqrt{3}x+\frac{\sqrt{3}}{2}3 square-root start_ARG 3 end_ARG italic_x = - square-root start_ARG 3 end_ARG italic_x + divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG 3x=x+123x=-x+\frac{1}{2}3 italic_x = - italic_x + divide start_ARG 1 end_ARG start_ARG 2 end_ARG 4x=124x=\frac{1}{2}4 italic_x = divide start_ARG 1 end_ARG start_ARG 2 end_ARG x=18x=\frac{1}{8}italic_x = divide start_ARG 1 end_ARG start_ARG 8 end_ARG Then y=33(18)=338y=3\sqrt{3}(\frac{1}{8})=\frac{3\sqrt{3}}{8}italic_y = 3 square-root start_ARG 3 end_ARG ( divide start_ARG 1 end_ARG start_ARG 8 end_ARG ) = divide start_ARG 3 square-root start_ARG 3 end_ARG end_ARG start_ARG 8 end_ARG.  然后 y=33(18)=3383318338y=3\sqrt{3}(\frac{1}{8})=\frac{3\sqrt{3}}{8}italic_y = 3 square-root start_ARG 3 end_ARG ( divide start_ARG 1 end_ARG start_ARG 8 end_ARG ) = divide start_ARG 3 square-root start_ARG 3 end_ARG end_ARG start_ARG 8 end_ARG The point CCitalic_C is (18,338)(\frac{1}{8},\frac{3\sqrt{3}}{8})( divide start_ARG 1 end_ARG start_ARG 8 end_ARG , divide start_ARG 3 square-root start_ARG 3 end_ARG end_ARG start_ARG 8 end_ARG ).
该点 CCitalic_C(18,338)18338(\frac{1}{8},\frac{3\sqrt{3}}{8})( divide start_ARG 1 end_ARG start_ARG 8 end_ARG , divide start_ARG 3 square-root start_ARG 3 end_ARG end_ARG start_ARG 8 end_ARG )
We need to check if this point lies on the segment AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG.
我们需要检查该点是否位于线段 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 上。
The point CCitalic_C is of the form (1212t,32t)(\tfrac{1}{2}-\tfrac{1}{2}t,\tfrac{\sqrt{3}}{2}t)( divide start_ARG 1 end_ARG start_ARG 2 end_ARG - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_t , divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG italic_t ).
该点 CCitalic_C 的形式为 (1212t,32t)121232(\tfrac{1}{2}-\tfrac{1}{2}t,\tfrac{\sqrt{3}}{2}t)( divide start_ARG 1 end_ARG start_ARG 2 end_ARG - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_t , divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG italic_t )
1212t=1844t=13=4tt=34\tfrac{1}{2}-\tfrac{1}{2}t=\tfrac{1}{8}\implies 4-4t=1\implies 3=4t\implies t=% \tfrac{3}{4}divide start_ARG 1 end_ARG start_ARG 2 end_ARG - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_t = divide start_ARG 1 end_ARG start_ARG 8 end_ARG ⟹ 4 - 4 italic_t = 1 ⟹ 3 = 4 italic_t ⟹ italic_t = divide start_ARG 3 end_ARG start_ARG 4 end_ARG. 32t=3234=338\tfrac{\sqrt{3}}{2}t=\tfrac{\sqrt{3}}{2}\cdot\tfrac{3}{4}=\tfrac{3\sqrt{3}}{8}divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG italic_t = divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG ⋅ divide start_ARG 3 end_ARG start_ARG 4 end_ARG = divide start_ARG 3 square-root start_ARG 3 end_ARG end_ARG start_ARG 8 end_ARG. So CCitalic_C lies on AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG.
所以 CCitalic_C 位于 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 上。
We need to make sure 0<t<10<t<10 < italic_t < 1, which is true for t=34t=\frac{3}{4}italic_t = divide start_ARG 3 end_ARG start_ARG 4 end_ARG.
我们需要确保 0<t<1010<t<10 < italic_t < 1 ,这对于 t=3434t=\frac{3}{4}italic_t = divide start_ARG 3 end_ARG start_ARG 4 end_ARG 也是正确的。
Now, we need to check if this point CCitalic_C does not belong to any other segment from \mathcal{F}caligraphic_F.
现在,我们需要检查此点 CCitalic_C 是否不属于从 \mathcal{F}caligraphic_F 的任何其他段。
The point CCitalic_C lies on the astroid. The segment AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG is tangent to the astroid at CCitalic_C.
CCitalic_C 位于阿波罗星形线(双曲线星形线)上。线段 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARGCCitalic_C 处与阿波罗星形线相切。
Any other segment from \mathcal{F}caligraphic_F that contains CCitalic_C would have CCitalic_C as an interior point, not a tangent point.
任何包含 CCitalic_C\mathcal{F}caligraphic_F 的其他部分,其内部点将是 CCitalic_C ,而不是切点。
Consider the parametrization of the family \mathcal{F}caligraphic_F.
考虑族 \mathcal{F}caligraphic_F 的参数化。
A point (x,y)(x,y)( italic_x , italic_y ) belongs to some segment in \mathcal{F}caligraphic_F iff there exists θ\thetaitalic_θ such that xcosθ+ysinθ=1\frac{x}{\cos\theta}+\frac{y}{\sin\theta}=1divide start_ARG italic_x end_ARG start_ARG roman_cos italic_θ end_ARG + divide start_ARG italic_y end_ARG start_ARG roman_sin italic_θ end_ARG = 1.
一个点 (x,y)(x,y)( italic_x , italic_y ) 属于 \mathcal{F}caligraphic_F 中的某条线段,当且仅当存在 θ\thetaitalic_θ 使得 xcosθ+ysinθ=11\frac{x}{\cos\theta}+\frac{y}{\sin\theta}=1divide start_ARG italic_x end_ARG start_ARG roman_cos italic_θ end_ARG + divide start_ARG italic_y end_ARG start_ARG roman_sin italic_θ end_ARG = 1
For the point C=(18,338)C=(\frac{1}{8},\frac{3\sqrt{3}}{8})italic_C = ( divide start_ARG 1 end_ARG start_ARG 8 end_ARG , divide start_ARG 3 square-root start_ARG 3 end_ARG end_ARG start_ARG 8 end_ARG ), we have:
对于点 C=(18,338)18338C=(\frac{1}{8},\frac{3\sqrt{3}}{8})italic_C = ( divide start_ARG 1 end_ARG start_ARG 8 end_ARG , divide start_ARG 3 square-root start_ARG 3 end_ARG end_ARG start_ARG 8 end_ARG ) ,我们有:
1/8cosθ+33/8sinθ=1\frac{1/8}{\cos\theta}+\frac{3\sqrt{3}/8}{\sin\theta}=1divide start_ARG 1 / 8 end_ARG start_ARG roman_cos italic_θ end_ARG + divide start_ARG 3 square-root start_ARG 3 end_ARG / 8 end_ARG start_ARG roman_sin italic_θ end_ARG = 1 1cosθ+33sinθ=8\frac{1}{\cos\theta}+\frac{3\sqrt{3}}{\sin\theta}=8divide start_ARG 1 end_ARG start_ARG roman_cos italic_θ end_ARG + divide start_ARG 3 square-root start_ARG 3 end_ARG end_ARG start_ARG roman_sin italic_θ end_ARG = 8 sinθ+33cosθ=8sinθcosθ\sin\theta+3\sqrt{3}\cos\theta=8\sin\theta\cos\thetaroman_sin italic_θ + 3 square-root start_ARG 3 end_ARG roman_cos italic_θ = 8 roman_sin italic_θ roman_cos italic_θ sinθ+33cosθ=4sin(2θ)\sin\theta+3\sqrt{3}\cos\theta=4\sin(2\theta)roman_sin italic_θ + 3 square-root start_ARG 3 end_ARG roman_cos italic_θ = 4 roman_sin ( 2 italic_θ ) We know that CCitalic_C lies on AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG, which corresponds to some specific segment in \mathcal{F}caligraphic_F.
我们知道 CCitalic_C 位于 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 上,这对应于 \mathcal{F}caligraphic_F 中的某个特定段。
The problem statement assumes the existence and uniqueness of such a point CCitalic_C.
问题陈述假设存在且唯一这样一个点 CCitalic_C
The unique point CCitalic_C on AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG, distinct from AAitalic_A and BBitalic_B, that does not belong to any segment from \mathcal{F}caligraphic_F other than AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG.
该点 CCitalic_CAB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 上,与 AAitalic_ABBitalic_B 不同,不属于除了 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 之外的 \mathcal{F}caligraphic_F 中的任何线段。
This means that if CCitalic_C lies on another segment PQ¯\overline{P^{\prime}Q^{\prime}}\in\mathcal{F}over¯ start_ARG italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ∈ caligraphic_F, then PQ¯=AB¯\overline{P^{\prime}Q^{\prime}}=\overline{AB}over¯ start_ARG italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG = over¯ start_ARG italic_A italic_B end_ARG.
这意味着如果 CCitalic_C 位于另一个线段 PQ¯superscriptsuperscript\overline{P^{\prime}Q^{\prime}}\in\mathcal{F}over¯ start_ARG italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ∈ caligraphic_F 上,那么 PQ¯=AB¯superscriptsuperscript\overline{P^{\prime}Q^{\prime}}=\overline{AB}over¯ start_ARG italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG = over¯ start_ARG italic_A italic_B end_ARG
The point CCitalic_C is the point of tangency of AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG with the astroid.
CCitalic_C 是点 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 与心形线的切点。
Now we need to find OC2OC^{2}italic_O italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.
现在我们需要找到 OC2superscript2OC^{2}italic_O italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
OC2=(18)2+(338)2=164+2764=2864=716OC^{2}=(\frac{1}{8})^{2}+(\frac{3\sqrt{3}}{8})^{2}=\frac{1}{64}+\frac{27}{64}=% \frac{28}{64}=\frac{7}{16}italic_O italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( divide start_ARG 1 end_ARG start_ARG 8 end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG 3 square-root start_ARG 3 end_ARG end_ARG start_ARG 8 end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG 64 end_ARG + divide start_ARG 27 end_ARG start_ARG 64 end_ARG = divide start_ARG 28 end_ARG start_ARG 64 end_ARG = divide start_ARG 7 end_ARG start_ARG 16 end_ARG. Here p=7p=7italic_p = 7 and q=16q=16italic_q = 16.
这里 p=77p=7italic_p = 7q=1616q=16italic_q = 16
ppitalic_p and qqitalic_q are relatively prime.
ppitalic_pqqitalic_q 是互质的。
p+q=7+16=23p+q=7+16=23italic_p + italic_q = 7 + 16 = 23. Final check of the reasoning:
最终推理检查:
The family \mathcal{F}caligraphic_F consists of unit-length segments with endpoints on the axes.
该族 \mathcal{F}caligraphic_F 由单位长度的线段组成,其端点位于轴上。
The envelope of this family is the astroid x2/3+y2/3=1x^{2/3}+y^{2/3}=1italic_x start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_y start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT = 1.
这个家族的包络是星形线 x2/3+y2/3=1superscript23superscript231x^{2/3}+y^{2/3}=1italic_x start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_y start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT = 1
The segments from \mathcal{F}caligraphic_F are tangent to the astroid.
\mathcal{F}caligraphic_F 的段与心形线相切。
The segment AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG is also a member of the family \mathcal{F}caligraphic_F.
该段 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 也是家族 \mathcal{F}caligraphic_F 的成员。
We are looking for a point CCitalic_C on AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG that lies on the astroid.
我们正在寻找一个位于星形线上的点 CCitalic_C ,该点位于 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG
This point CCitalic_C is the point of tangency of AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG with the astroid.
这一点 CCitalic_CAB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 与心形线的切点。
If a point on AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG lies on the astroid, it belongs to the tangent segment at that point. Since AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG is the tangent segment, this point does not belong to any other segment from \mathcal{F}caligraphic_F.
如果点 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 位于阿波罗尼圆上,则该点属于该点的切线段。由于 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 是切线段,因此该点不属于从 \mathcal{F}caligraphic_F 的任何其他段。
Consider the case when the segment AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG intersects the astroid at two points.
考虑当段 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 与双曲线在两点相交的情况。
If the segment AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG is not tangent to the astroid, then there are two points of intersection.
如果段 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 不与阿波罗尼圆相切,则存在两个交点。
Let the points of intersection be C1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and C2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.
设交点为 C1subscript1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTC2subscript2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
If C1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT lies on AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG, then the tangent to the astroid at C1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is the segment AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG.
如果 C1subscript1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 位于 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 上,则 C1subscript1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 处的心形线切线是线段 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG
If C2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT lies on AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG, then the tangent to the astroid at C2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is the segment AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG.
如果 C2subscript2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 位于 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 上,则 C2subscript2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 处的心形线切线是线段 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG
The slope of AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG is 3-\sqrt{3}- square-root start_ARG 3 end_ARG.
AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 的斜率为 33-\sqrt{3}- square-root start_ARG 3 end_ARG
We found the point on the astroid with tangent slope 3-\sqrt{3}- square-root start_ARG 3 end_ARG.
我们在双曲线上找到了切线斜率为 33-\sqrt{3}- square-root start_ARG 3 end_ARG 的点。
The problem statement guarantees the uniqueness of such a point CCitalic_C.
问题陈述保证了这样一个点的唯一性 CCitalic_C
The point CCitalic_C must be interior to AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG, so 0<t<10<t<10 < italic_t < 1.
该点 CCitalic_C 必须位于 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 内部,因此 0<t<1010<t<10 < italic_t < 1
The parameter t=34t=\frac{3}{4}italic_t = divide start_ARG 3 end_ARG start_ARG 4 end_ARG is in this range.
参数 t=3434t=\frac{3}{4}italic_t = divide start_ARG 3 end_ARG start_ARG 4 end_ARG 位于此范围内。
The point CCitalic_C is distinct from AAitalic_A and BBitalic_B.
该点 CCitalic_CAAitalic_ABBitalic_B 不同。
If C=AC=Aitalic_C = italic_A, then t=0t=0italic_t = 0, C=(12,0)C=(\tfrac{1}{2},0)italic_C = ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 0 ).
如果 C=AC=Aitalic_C = italic_A ,则 t=00t=0italic_t = 0C=(12,0)120C=(\tfrac{1}{2},0)italic_C = ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 0 )
If C=BC=Bitalic_C = italic_B, then t=1t=1italic_t = 1, C=(0,32)C=(0,\tfrac{\sqrt{3}}{2})italic_C = ( 0 , divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG ).
如果 C=BC=Bitalic_C = italic_B ,则 t=11t=1italic_t = 1C=(0,32)032C=(0,\tfrac{\sqrt{3}}{2})italic_C = ( 0 , divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG )
The point of tangency CCitalic_C is the unique point on AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG (excluding endpoints) that lies on the astroid.
切点 CCitalic_CAB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG (不包括端点)上唯一的位于心形线的点。
Solution:  解决方案: Let the family of segments \mathcal{F}caligraphic_F be defined by segments PQ¯\overline{PQ}over¯ start_ARG italic_P italic_Q end_ARG of unit length with PPitalic_P on the positive xxitalic_x-axis and QQitalic_Q on the positive yyitalic_y-axis. Let P=(a,0)P=(a,0)italic_P = ( italic_a , 0 ) and Q=(0,b)Q=(0,b)italic_Q = ( 0 , italic_b ). The length condition gives a2+b2=1a^{2}+b^{2}=1italic_a start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_b start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1. The segment PQ¯\overline{PQ}over¯ start_ARG italic_P italic_Q end_ARG lies on the line xa+yb=1\frac{x}{a}+\frac{y}{b}=1divide start_ARG italic_x end_ARG start_ARG italic_a end_ARG + divide start_ARG italic_y end_ARG start_ARG italic_b end_ARG = 1. Using the parametrization a=cosθa=\cos\thetaitalic_a = roman_cos italic_θ, b=sinθb=\sin\thetaitalic_b = roman_sin italic_θ with 0<θ<π20<\theta<\frac{\pi}{2}0 < italic_θ < divide start_ARG italic_π end_ARG start_ARG 2 end_ARG, the equation of the line is xcosθ+ysinθ=1\frac{x}{\cos\theta}+\frac{y}{\sin\theta}=1divide start_ARG italic_x end_ARG start_ARG roman_cos italic_θ end_ARG + divide start_ARG italic_y end_ARG start_ARG roman_sin italic_θ end_ARG = 1.
定义由单位长度的线段 \mathcal{F}caligraphic_F 组成的一组线段,其中 PPitalic_P 位于正 xxitalic_x 轴上, QQitalic_Q 位于正 yyitalic_y 轴上。设 P=(a,0)0P=(a,0)italic_P = ( italic_a , 0 )Q=(0,b)0Q=(0,b)italic_Q = ( 0 , italic_b ) 。长度条件给出 a2+b2=1superscript2superscript21a^{2}+b^{2}=1italic_a start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_b start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1 。线段 PQ¯\overline{PQ}over¯ start_ARG italic_P italic_Q end_ARG 位于直线 xa+yb=11\frac{x}{a}+\frac{y}{b}=1divide start_ARG italic_x end_ARG start_ARG italic_a end_ARG + divide start_ARG italic_y end_ARG start_ARG italic_b end_ARG = 1 上。使用参数化 a=cosθa=\cos\thetaitalic_a = roman_cos italic_θb=sinθb=\sin\thetaitalic_b = roman_sin italic_θ ,其中 0<θ<π2020<\theta<\frac{\pi}{2}0 < italic_θ < divide start_ARG italic_π end_ARG start_ARG 2 end_ARG ,直线的方程为 xcosθ+ysinθ=11\frac{x}{\cos\theta}+\frac{y}{\sin\theta}=1divide start_ARG italic_x end_ARG start_ARG roman_cos italic_θ end_ARG + divide start_ARG italic_y end_ARG start_ARG roman_sin italic_θ end_ARG = 1
The envelope of this family of lines is the astroid x2/3+y2/3=1x^{2/3}+y^{2/3}=1italic_x start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_y start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT = 1.
该族线的包络是心形线 x2/3+y2/3=1superscript23superscript231x^{2/3}+y^{2/3}=1italic_x start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_y start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT = 1
The segments from \mathcal{F}caligraphic_F are tangent to this astroid.
\mathcal{F}caligraphic_F 的段与此心形线相切。
The segment AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG connects A(12,0)A(\tfrac{1}{2},0)italic_A ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 0 ) and B(0,32)B(0,\tfrac{\sqrt{3}}{2})italic_B ( 0 , divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG ). The equation of the line containing AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG is y=3x+32y=-\sqrt{3}x+\frac{\sqrt{3}}{2}italic_y = - square-root start_ARG 3 end_ARG italic_x + divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG. This segment is part of the family \mathcal{F}caligraphic_F.
该段 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 连接 A(12,0)120A(\tfrac{1}{2},0)italic_A ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 0 )B(0,32)032B(0,\tfrac{\sqrt{3}}{2})italic_B ( 0 , divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG ) 。包含 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 的直线方程为 y=3x+32332y=-\sqrt{3}x+\frac{\sqrt{3}}{2}italic_y = - square-root start_ARG 3 end_ARG italic_x + divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG 2 end_ARG 。此段是家族 \mathcal{F}caligraphic_F 的一部分。
We are looking for a unique point CCitalic_C on AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG, distinct from AAitalic_A and BBitalic_B, that does not belong to any segment from \mathcal{F}caligraphic_F other than AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG. This means CCitalic_C is the point where AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG is tangent to the astroid.
我们正在寻找一个独特的点 CCitalic_C ,该点位于 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 上,与 AAitalic_ABBitalic_B 不同,且不属于除了 AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 之外的任何从 \mathcal{F}caligraphic_F 出发的线段。这意味着 CCitalic_CAB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 与心形线相切的点。
The slope of the tangent to the astroid at a point (x,y)(x,y)( italic_x , italic_y ) is yx3-\sqrt[3]{\frac{y}{x}}- nth-root start_ARG 3 end_ARG start_ARG divide start_ARG italic_y end_ARG start_ARG italic_x end_ARG end_ARG. The slope of AB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG is 3-\sqrt{3}- square-root start_ARG 3 end_ARG.
切线在点 (x,y)(x,y)( italic_x , italic_y ) 处的斜率为 yx33-\sqrt[3]{\frac{y}{x}}- nth-root start_ARG 3 end_ARG start_ARG divide start_ARG italic_y end_ARG start_ARG italic_x end_ARG end_ARGAB¯\overline{AB}over¯ start_ARG italic_A italic_B end_ARG 的斜率为 33-\sqrt{3}- square-root start_ARG 3 end_ARG
Setting the slopes equal: yx3=3yx=33-\sqrt[3]{\frac{y}{x}}=-\sqrt{3}\implies\frac{y}{x}=3\sqrt{3}- nth-root start_ARG 3 end_ARG start_ARG divide start_ARG italic_y end_ARG start_ARG italic_x end_ARG end_ARG = - square-root start_ARG 3 end_ARG ⟹ divide start_ARG italic_y end_ARG start_ARG italic_x end_ARG = 3 square-root start_ARG 3 end_ARG.
设置斜率相等: yx3=3yx=333333-\sqrt[3]{\frac{y}{x}}=-\sqrt{3}\implies\frac{y}{x}=3\sqrt{3}- nth-root start_ARG 3 end_ARG start_ARG divide start_ARG italic_y end_ARG start_ARG italic_x end_ARG end_ARG = - square-root start_ARG 3 end_ARG ⟹ divide start_ARG italic_y end_ARG start_ARG italic_x end_ARG = 3 square-root start_ARG 3 end_ARG