This is a bilingual snapshot page saved by the user at 2025-1-27 20:30 for https://app.immersivetranslate.com/word/, provided with bilingual support by Immersive Translate. Learn how to save?

团队 # 1111111 1 页 共 11

问题选择
Question selection

C

2025
MCM/ICM
摘要表
Summary table

团队控制号码
Team Control Numbers

1111111

从数据到领奖台:奥运奖牌的科学解析
From data to podium: the science of Olympic medals

在奥运的浩瀚历史长河中,每一枚奖牌都承载着运动员的热血与梦想,它们的诞生绝非偶然,背后是无数因素交织的结果。从 1896 年首届现代奥运会的激情启幕,到 2024 年的奥运盛会,一届又一届的夏季奥运会见证了人类体能与竞技精神的不断突破。那些闪耀在领奖台上的荣耀瞬间,不仅是运动员个人的高光时刻,更是各个国家体育实力的生动展现。
In the vast history of the Olympic Games, every medal carries the blood and dreams of athletes, and their birth is by no means accidental, behind the interweaving of countless factors. From the inaugural modern Olympic Games in 1896 to the Olympic Games in 2024, the Summer Olympics have witnessed a breakthrough in human physical fitness and athleticism. Those glorious moments shining on the podium are not only the highlight moments of individual athletes, but also a vivid display of the sports strength of various countries.

基于历史夏季奥运会数据,构建了一套多维度奥运奖牌预测模型,旨在分析奖牌分布规律并为2028年洛杉矶奥运会提供科学预测。
Based on the data of the historical Summer Olympic Games, a multi-dimensional Olympic medal prediction model was constructed to analyze the distribution of medals and provide scientific prediction for the 2028 Los Angeles Olympic Games.

首先,通过整合运动员信息、奖牌统计、东道主数据及赛事项目数据,完成数据清洗与聚合,基于经济学与体育学理论构建的综合经济实力指数(CESI)、综合人才基数指数(CTPI)、动态东道主效应(HostBoost)、参赛选手实力指数(ACI)、战略聚焦指数(SFI)、参数项目主成分(Breadth_PC)等6项核心特征的数据集。
Firstly, by integrating athlete information, medal statistics, host master data and event event data, the data cleaning and aggregation were completed, and the dataset of six core characteristics, including the comprehensive economic strength index (CESI), the comprehensive talent base index (CTPI), the dynamic host effect (HostBoost), the competitor strength index (ACI), the strategic focus index (SFI), and the principal component of parametric items (Breadth_PC), was constructed based on the theory of economics and sports.

针对问题1,首先建立经典线性回归模型,采用普通最小二乘法(OLS)估计回归系数;加入结合交叉验证的随机森林回归模型进行对比,选择检验结果更好的随机森林回归模型来预测2028年美国洛杉矶夏季奥运会的奖牌榜;对比同一国家不同届次2024与2028年的数据分析国家的进退步程度,同时模型计算了奖牌榜预测结果的检验以衡量预测的不确定性;然后延用随机森林回归与交叉验证优化模型,算出尚未获奖的国家在下届奥运会获得第一枚奖牌的概率;之后加权比率法来探讨比赛项目与各国奖牌数量之间的关系,最后计算东道主与非东道主国家的平均奖牌数,建立东道主选择对比赛项目结果影响的模型,按国家和奖牌数排序,列出每个国家最重要的赛事。
For problem 1, firstly, a classical linear regression model was established, and the regression coefficient was estimated by ordinary least squares (OLS). The random forest regression model combined with cross-validation was added for comparison, and the random forest regression model with better test results was selected to predict the medal table of the 2028 Summer Olympics in Los Angeles, USA. The data analysis of the same country in 2024 and 2028 compares the progress and regression of the same country, and the model calculates the test of the prediction results of the medal table to measure the uncertainty of the prediction. Then, the random forest regression and cross-validation optimization model are used to calculate the probability that the country that has not yet won the prize will win the first medal in the next Olympic Games. Then, the weighted ratio method explores the relationship between the event and the number of medals in each country, and finally calculates the average number of medals between the host and non-host countries, establishes a model of the impact of host choice on the outcome of the event, and lists the most important events for each country in order of country and medal number.

结果显示,美国和中国仍将主导金牌榜,但新兴国家(如圣卢西亚)首枚奖牌概率显著提升;有103个国家2028年预测金牌数大于2024年(例如冈比亚),有47个退步国家(例如哥伦比亚);进一步分析表明,东道主通过新增赛事项目可额外获得约2枚奖牌,且游泳、田径等项目对奖牌贡献率最高。
The results show that the United States and China will still dominate the gold medal table, but emerging countries such as Saint Lucia have significantly improved their chances of winning a first medal; There are 103 countries with more gold medals in 2028 than in 2024 (e.g. Gambia) and 47 countries that are regressing (e.g. Colombia); Further analysis shows that the host country can win about 2 additional medals through the new events, and swimming, track and field events contribute the highest to the medals.

针对问题2,选择了三个国家(中国CHN、美国USA和罗马尼亚ROU),提出混合模型(面板回归+加权正则化),量化了教练更换对奖牌数的贡献(如郎平对美国奖牌数的影响因子7.75Perticaroli罗马尼亚奖牌数的影响因子为5.61)。
For question 2, three countries (China CHN, USA and Romania ROU) were selected, and a mixed model (panel regression + weighted regularization) was proposed to quantify the contribution of coach change to the number of medals (for example, the impact factor of Lang Ping on the number of medals in the United States was 7.75, and the impact factor of Perticaroli on the number of medals in Romania was 5.61).

针对问题3,根据问题1与问题2的模型进行总结,通过数据来检验不同地区的国家在奖牌分布是否存在显著差异;区域聚类分析揭示欧洲与非洲奖牌分布差异显著,建议经济弱势国家聚焦“低投入-高回报”项目(如举重)研究结果为国家奥林匹克委员会(NOC)优化资源配置、制定战略政策提供了数据驱动的决策支持。
For question 3, the models of question 1 and question 2 are summarized, and the data are used to test whether there are significant differences in the distribution of medals between countries in different regions. Regional cluster analysis revealed that there were significant differences in the distribution of medals between Europe and Africa, and it was suggested that economically disadvantaged countries should focus on "low input-high return" projects (such as weightlifting). The results of this study provide data-driven decision-making support for National Olympic Committees (NOCs) to optimize resource allocation and formulate strategic policies.

关键词:奥运奖牌预测;随机森林回归;面板回归;,拉索;决策建议
Keywords: Olympic medal prediction; random forest regression; panel regression; , Lasso; Recommendations for decision-making

内容
content

1 介绍 3
1 Introduction 3

1.1 问题重述 3
1.1 Problem Restatement 3

1.2 你的工作 3
1.2 Your job 3

2 假设和理由 4
2 Assumptions and Rationale 4

3 符号 4
3 Symbol 4

4 模型准备 4
4 Model Preparation 4

5 问题一 5
5 Question 15

5.1 问题一分析 5
5.1 Problem 1 Analysis 5

5.2 问题一模型建立与求解 5
5.2 Problem 1: Model Building and Solving5

5.2.1 夏季奥运会奖牌榜的预测模型 5
5.2.1 Prediction models for the medal table of the Summer Olympics5

5.2.2 夏季奥运会奖牌榜的预测结果与不确定性量化 7
5.2.2 Prediction results and uncertainty quantification of the medal table for the Summer Olympic Games7

5.2.3 未获奖国家的获奖概率预测模型 10
5.2.3 Prediction models for the probability of winning a prize for a country that has not won a prize10

5.2.4 未获奖国家的获奖概率预测结果 11
5.2.4 Prediction of the probability of winning a prize for a country that has not won a prize11

5.2.5 比赛项目与各国奖牌数量之间的关系研究 11
5.2.5 Study of the relationship between the competition programme and the number of medals in each country11

5.2.6 东道主选择对比赛项目结果的影响 13
5.2.6 Effect of host selection on the outcome of the event13

6 问题二 14
6 Question II, 14

6.1 问题二分析 14
6.1 Problem 2 analysis14

6.2 问题二模型建立 15
6.2 Problem 2 modeling15

6.3 问题二模型求解及检验与评估 16
6.3 Problem 2 Model Solving and Testing and Evaluation16

7 问题三 17
7 Question III 17

7.1 奖牌分布存在显著差异 17
7.1 There are significant differences in the distribution of medals17

7.2 对奥委会的战略建议 19
7.2 Strategic recommendations to the Olympic Committee19

8 模型评价与推广 19
8 Model evaluation and generalization19

9 参考文献 20
9 Ref. 20

10 附录 21
10 Appendix 21

团队 # 1111111 22 页,共 11

介绍
introduce

重述问题
Restate the question

问题 1:奥运奖牌预测:数据跳水
Question 1: Olympic Medal Predictions: Data Diving

问题重述:开发一个可以预测各国在奥运会上的金牌和总奖牌数模型。模型需量化预测的不确定性,提供预测区间,并评估模型性能。基于此模型,预测2028年洛杉矶奥运会的奖牌榜,分析哪些国家可能进步,哪些可能退步。模型应涵盖尚未获奖牌的国家,并预测首次获奖牌的国家数量及其概率。探索项目与奖牌数量的关系,分析哪些运动对不同国家最重要,以及东道主选择的项目如何影响结果。
Problem restatement: Develop a system that can predict the number of gold medals and total medals of each country at the Olympic GamesModel. The model quantifies the uncertainty of the prediction, provides the prediction interval, and evaluates the model performance. Based on this model, predict the medal table for Los Angeles 2028 and analyze which countries are likely to improve and which may regress. The model should cover countries that have not yet won medals and predict the number of countries that have won medals for the first time and their probability. Explore the relationship between events and the number of medals, analyze which sports are most important for different countries, and how the events chosen by the host country affect the results.

问题 2:奥林匹克奖牌:伟大的教练效应
Problem 2: Olympic Medals: The Great Coaching Effect

问题重述:通过分析数据,识别是否存在因教练流动而导致的奖牌数量变化,并量化伟大教练效应的贡献。需要选择三个具有代表性的国家,分析它们在哪些项目上应考虑引入“伟大教练”,并评估这种投资可能带来的奖牌数量变化。
Problem restatement: By analyzing the data, identify if there is a change in the number of medals due to coach turnover and quantify the contribution of the Great Coach Effect. Three representative countries need to be selected, analysed in which disciplines they should consider bringing in "great coaches", and assess the changes in the number of medals that such investment could bring.

问题 3:奥运奖牌:弱者与泰坦
Question 3: Olympic Medals: Underdogs vs. Titans

问题重述:通过模型分析,揭示关于奥运奖牌数量的其他独特见解。同时站在各国奥委会的角度上,这些见解应能够为其在制定奥运战略、分配资源以及选择参赛项目等方面提供有价值的参考
Problem Statement: Model analysis to reveal other unique insights about the number of Olympic medals. At the same time, from the perspective of National Olympic Committees, these insights should provide valuable references for the development of Olympic strategies, the allocation of resources, and the selection of participating events.

Oyour work (你的工作)
Oyour work

,你的
Oh, your werewolf

假设和理由
Assumptions and rationales

假设1:竞技体育比赛中的主场优势是客观存在的,也称主场效应,一般认为在主客场平衡制的比赛中,主队在比赛中取胜的概率更高
Hypothesis 1: Home field advantage in competitive sports competitions is objective, also known as the home field effect, and it is generally believed that in a balanced home-and-away game, the home team has a higher probability of winning the game.

解释主场优势受旅途奔波、赛场环境、观众、裁判、运动员心理因素等影响,旅途和赛场环境不是影响主场优势的主要因素
Explanation: Home field advantage is affected by the travel trip, the stadium environment, spectators, referees, athletes' psychological factors, etc., and the journey and the stadium environment are not the main factors affecting home field advantage

假设2:每届奥运会新增的运动项目和赛事会影响奖牌数量。
Hypothesis 2: The number of new sports and events added to each Olympic Games will affect the number of medals.

说明新运动或赛事的引入可以为某些国家提供在其具有竞争优势的项目中赢得奖牌的机会。历史趋势表明,国家通常会投资于新兴运动以最大化其奖牌潜力
Description: The introduction of a new sport or event can provide certain countries with the opportunity to win medals in events in which they have a competitive advantage. Historical trends suggest that countries typically invest in emerging sports to maximize their medal potential.

假设3:奥运会表现可能会因外部因素(如“伟大教练效应”)而波动。
Hypothesis 3: Olympic performance may fluctuate due to external factors such as the "great coach effect".

解释精英教练的影响(例如郎平在排球、贝拉·卡罗里在体操)可以显著影响一个国家在特定运动中的表现
Explanation: The influence of elite coaches (e.g., Lang Ping in volleyball, Bella Caroli in gymnastics) can significantly affect a country's performance in a particular sport.

假设4:不同国家根据文化、经济和战略因素优先发展不同的运动项目。
Hypothesis 4: Different countries prioritize the development of different sports based on cultural, economic and strategic factors.

说明某些运动在特定国家更受欢迎或获得更多资金支持
Description: Certain sports are more popular or receive more funding in certain countries

Assumptions5:奖牌预测存在不确定性,并使用区间估计来量化这种不确定性。
Assumptions5: There is uncertainty in medal predictions, and interval estimation is used to quantify this uncertainty.

解释由于奥运会表现的内在可变性(例如意外伤病、运动员“黄金期”),我们假设预测应包括对不确定性的衡量。
Explanation: Due to the inherent variability of Olympic performance (e.g., unexpected injuries, athletes' "golden period"), we assume that forecasts should include a measure of uncertainty.

符号
symbol

表 1 列出了本文中使用的关键数学符号
Table 1 lists the key mathematical notations used in this article.

本文中使用的符号
Symbols used in this article

象征
symbol

描述
description

Y

预测的金牌数或总奖牌数
Predicted number of gold medals or total medals

X(n=1,2,⋯,n

特征变量,如历史奖牌数、运动员数量、项目数量
Characteristic variables, such as the number of medals in history, the number of athletes, the number of events

β

截距项,表示当所有特征变量为零时的基准奖牌数。
The intercept term, which represents the base number of medals when all characteristic variables are zero.

β(n=1,2,⋯,n

回归系数,反映各个特征变量对奖牌数的影响程度。
The regression coefficient reflects the degree of influence of each characteristic variable on the number of medals.

ε

误差项,表示回归模型的随机波动和无法解释的部分。
Error terms, which represent random fluctuations and unexplained parts of the regression model.

模型准备
Model preparation

1.数据预处理
1. Data preprocessing

提取国家信息:从运动员数据集中提取国家信息,便于后续按国家分组统计。
Extracting country information: Extracting country information from athlete datasets for subsequent grouping by country.

处理年份数据:对年份数据排序并提取唯一值,便于按年份分组统计。
Process year data: Sort and extract unique values from year data to facilitate grouping by year.

重命名和填充数据:使用“年份(Year)”和“国家代码(NOC)”作为核心键,连接奖牌榜、东道主信息和项目设置;通过“运动员ID”或“项目代码”关联运动员记录与项目数据;对团队项目进行拆分,提取国家代码,并聚合到国家层级;并使用fillna(0)填充缺失值(只有Basque Pelota的Basque Pelota的1988数据缺失);验证数据完整性和一致性,确保每个国家的奖牌总数与运动员记录中的奖牌数一致;检查东道主信息与项目设置的对应关系,验证东道主年份是否新增了特定项目。
Rename and populate data: Use "Year" and "Country Code (NOC)" as core keys to connect medal tables, host information, and project settings; Linking athlete records with event data via "Athlete ID" or "Event Code"; Split the team project, extract the country code, and aggregate it to the country level; and fill in the missing values with fillna(0) (only the 1988 data of Basque Pelota by Basque Pelota is missing); Verify data integrity and consistency to ensure that the total number of medals in each country is consistent with the number of medals in the athlete's record; Check the correspondence between the host information and the project settings to verify that a specific project has been added to the host year.

2.特征工程
2. Feature engineering

定义特征提取函数:提取特定国家和比赛项目的特征数据,包括参赛人员数量、总参赛次数、队伍数量、拿过奖牌的队员数量、近三年赛事擅长能力、赛事擅长能力、是否为主办方、全赛事历史总奖牌数、赛事举行次数以及金牌、银牌、铜牌数量等。
Define feature extraction function: extract the feature data of specific countries and events, including the number of participants, the total number of entries, the number of teams, the number of team members who have won medals, the ability to excel in the event in the past three years, the ability to excel in the event, whether it is the organizer, the total number of medals in the history of the event, the number of events held, and the number of gold, silver, and bronze medals.

问题一
Question one

问题一分析
Problem 1 analysis

要预测2028年美国洛杉矶夏季奥运会奖牌榜,分析各国在历届奥运会中的表现,包括金牌数和总奖牌数的变化趋势,回归分析能够找出不同因素对奖牌数的影响,从而为未来的奖牌榜预测提供数据支持。因为要求不是基于历史奖牌数据,所以这些因素可能包括国家经济指标(GDP)、人口数量、东道主效应、奥运参赛选手、数量参赛的项目数量和种类等,通过这些特征变量的分析,建立经典线性回归模型,采用普通最小二乘法(OLS)估计回归系数得到预测模型。
To predict the medal table of the 2028 Summer Olympics in Los Angeles, USA, and analyze the performance of each country in the previous Olympic Games, including the trend of the number of gold medals and the total number of medals, regression analysis can find out the impact of different factors on the number of medals, so as to provide data support for future medal table predictions. Because the requirements are not based on historical medal data, these factors may include national economic indicators (GDP), population size, host effect, Olympic athletes, number of events and types of events, etc., through the analysis of these characteristic variables, a classical linear regression model is established, and the regression coefficient is estimated by the ordinary least squares method (OLS) to obtain the prediction model.

并且使用随机森林优化模型预测2028年美国洛杉矶夏季奥运会的奖牌榜比较20242028年的数据分析国家的进退步程度同时算出尚未获奖的国家在下届奥运会获得首枚奖牌的概率。最后加权比率法来探究比赛项目与各国奖牌数量之间的关系采用SHAP确定哪些特征在模型中起到了重要作用,并比较在是否为东道主的情况下判断东道主选择的比赛项目队获得奖牌的影响情况。
In addition, the random forest optimization model is used to predict the medal table of the 2028 Summer Olympics in Los Angeles, USA, and the data for 2024 and 2028 are compared to analyze the country's progress and regressionand calculate the probability that a country that has not yet won a medal will win its first medal at the next Olympics. Finally, the weighted ratio method was used to explore the relationship between the event and the number of medals in each country, and the SHAP was used to determine which characteristics played an important role in the model, and the impact of judging the medals won by the host team in the event with or without the host country was compared.

问题一模型建立与求解
Problem 1: Model building and solving

夏季奥运会奖牌榜的预测模型
A prediction model for the medal table of the Summer Olympics

一、夏季奥运会奖牌榜的预测模型建立
1. Establishment of a prediction model for the medal table of the Summer Olympics

(一)经典线性回归模型
(1) Classical linear regression model

1.数学模型
1. Mathematical model

线性回归模型假设奖牌数与一系列特征之间存在线性关系。设定回归方程如下。
Linear regression models assume that there is a linear relationship between the number of medals and a range of characteristics. Set the regression equation as follows.

Y=β+βX+βX+⋯+βX

()

2.特征构建
2. Feature building

根据比赛规则,模型必须仅使用提供的数据集,在没有GDP和人口数据的情况下,仍然要能够构建有效的预测模型。因此需要从现有数据中挖掘出替代这些经济指标的变量。这涉及到从奖牌历史数据、运动员表现、参赛项目数量等方面提取特征。
According to the rules of the competition, the model must use only the provided dataset and still be able to build an effective predictive model in the absence of GDP and population data. Therefore, it is necessary to mine the available data for variables that replace these economic indicators. This involves extracting features from historical medal data, athlete performance, number of events, and more.

3.模型重构与验证
3. Model reconstruction and validation

特征名称
The name of the feature

符号
symbol

计算方法
Calculation method:

经济学意义
Economic implications

综合经济实力指数
Composite Economic Strength Index

CESI

主成分分析(体育投入EMA、奖牌效率、项目集中度)
Principal component analysis (sports investment EMA, medal efficiency, project concentration)

国家体育资源投入质量与战略有效性
The quality of national sports resource input and the effectiveness of the strategy

综合人才基数指数
Comprehensive Talent Base Index

CTPI

加权合成(项目渗透率、运动员持续率、年龄结构熵)
Weighted synthesis (item penetration, athlete persistence, age structure entropy)

人才储备厚度与培养体系可持续性
The thickness of the talent pool and the sustainability of the training system

动态东道主效应
Dynamic host effect

HostBoost 主机提升
HostBoost Hosting Boost

三阶段动态模型:当届效应+遗产效应+项目杠杆效应
Three-stage dynamic model: current effect + legacy effect + project leverage effect

主场优势的量化
Quantification of home field advantage

参赛选手实力指数
Contestant strength index

国际机场
international airport

生涯奖牌数×年龄适配×稳定性×项目权重
Career medal count× age adaptation× stability × event weight

运动员个人竞技水平与战略价值
Athletes' individual competitive level and strategic value

战略聚焦指数
Strategic Focus Index

SFI

优势项目参赛人数占比(历史奖牌最多的前3个项目)
Percentage of participants in dominant events (top 3 events with the most medals in history)

资源集中度与重点突破策略
Resource concentration and key breakthrough strategies

项目多样性指数
Project Diversity Index

SDI

香农熵(各项目参赛人数分布)
Shannon entropy (distribution of participants in each event)

项目覆盖广度与均衡性
Breadth and balance of project coverage

参赛项目主成分
The main component of the entry project

Breadth_PC

PCA合成(绝对参赛项目数、相对参赛广度)
PCA synthesis (absolute number of entries, relative breadth of entries)

消除共线性后的综合参赛规模指标
The comprehensive entry size index after eliminating collinearity

模型特征体系
Model Feature System Table

4.模型公式
4. Model formula

金牌数预测模型
Gold Medal Prediction Model:

=β+βCESI+βCTPI+βHostBoost 主机提升+β国际机场+βSFI+βBreadth_PC+ε

()

总奖牌数牌模型βα
Total Medal Count Medal Models: βα

=α+αCESI+αCTPI+αHostBoost 主机提升+α国际机场SFI+αBreadth_PC+η

()

(二)随机森林回归模型
(2) Random forest regression model

通过使用随机森林回归模型对每个国家的金牌数和总奖牌数进行预测训练数据包括年份、运动员数量和赛事数量,目标变量是金牌数和总奖牌数。数据被拆分为训练集和测试集,以评估模型的性能。
The number of gold medals and total medals for each country was predicted by using a random forest regression model, and the training data included the year, number of athletes, and number of events, with the target variables being the number of gold medals and the total number of medals. The data is split into a training set and a test set to evaluate the model's performance.

1.特征工程优化
1. Feature engineering optimization

特征类别
Feature Category

具体特征
Specific characteristics:

处理方式
Processing

经济实力
Economic strength

CESI(综合经济实力指数)
CESI (Composite Economic Power Index)

标准化
standardization

人才储备
Talent pool

CTPI(综合人才基数指数)
CTPI (Composite Talent Base Index)

标准化
standardization

东道主效应
Host effect

HostBoost(动态三阶段模型)
HostBoost (Dynamic Three-Phase Model)

原值输入
Original value input

参赛规模
Entry size

Breadth_PC(参赛广度主分)
Breadth_PC (main score for participation breadth)

PCA保持成分得分
PCA maintains ingredient scores

战略聚焦
Strategic focus

SFI(战略聚焦指数)
SFI (Strategic Focus Index)

原值输入
Original value input

项目多样性
Project diversity

SDI(香农多样性指数)
SDI (Shannon Diversity Index)

原值输入
Original value input

运动员实力
Athlete strength

ACI(运动员实力指数)
ACI (Athlete Performance Index)

标准化+异常值Winsorize
Normalization + Outlier Winsorize

特征清单(基于前期构造)
Feature list table (based on previous construction).

2.新增特征(提升非线性捕捉能力)项目竞争力梯度:
2. New feature (improve nonlinear capture ability) project competitiveness gradient:

CompeteGrade (竞技等级)=国家s项目奖牌数项目s全球奖牌数×CESI

()

3.年龄效能曲线:使用三次样条插值拟合各项目最优年龄分布概率密度函数
3. Age performance curve: Cubic spline interpolation was used to fit the probability density function of the optimal age distribution of each item

二、夏季奥运会奖牌榜的预测模型检验与评估以及求解
2. Test and evaluation of the prediction model of the medal table of the Summer Olympics and solve it

统计性能指标
Statistical performance indicators

经典线性回归模型1
Classical linear regression model 1

经典线性回归模型2
Classical linear regression model 2

随机森林
Random forest

回归1模型
Regression 1 model

随机森林
Random forest

回归2模型
Regression 2 model

0.0269

0.0453

-0.1289

-0.1005

小微电子

0.5861

0.4448

0.6800

0.5128

0.1859

0.1839

0.2084

0.2064

假设检验
Hypothesis testing

残差正态性p值: 0.0000
Residual normality p-value: 0.0000

参数估计
Parameter estimation

使用普通最小二乘(OLS)估计回归系数,确保残差平方和最小化
The regression coefficient is estimated using the ordinary least squares (OLS) method, ensuring that the sum of squares of the residuals is minimized

显著性检验
Significance test

t 检验 p 值 < 0.05
t-test p-value < 0.05

F 检验 p 值 < 0.01
F test p-value < 0.01

为了更好的验证哪种模型适用于奥运会奖牌的预测,所以选择水上游泳项目获得金牌数量进行两个模型的验证比较,如图:
In order to better verify which model is suitable for the prediction of Olympic medals, the number of gold medals won in water swimming is selected for verification and comparison of the two models, as shown in the figure:

1-CHN水上游泳项目金牌数模型验证
Figure 1 - Model verification of the number of gold medals in the CHN water swimming event

由图可知,水上游泳项目获得金牌数量模型的验证结果,随机森林回归模型明显优于经典线性回归模型,选择随机森林回归模型进行对夏季奥运会奖牌榜的预测。
As can be seen from the figure, the random forest regression model is significantly better than the classical linear regression model for the verification results of the gold medal model for water swimming, and the random forest regression model is selected to predict the medal table of the Summer Olympic Games.

夏季奥运会奖牌榜的预测结果与不确定性量化
Prediction results and uncertainty quantification of the medal table for the Summer Olympics

1.预测2028奖牌数(金牌与总奖牌数)
1. Predict the number of medals in 2028 (gold medal and total medals)

基于经济学与体育学理论构建6项核心特征:综合经济实力指数(CESI)、综合人才基数指数(CTPI)、动态东道主效应(HostBoost)、参赛选手实力指数(ACI)、战略聚焦指数(SFI)、参数项目主成分(Breadth_PC),计算2024年的数据,预测了2028年所有国家的奖牌数分布。
Based on the theories of economics and sports, six core characteristics were constructed: Comprehensive Economic Power Index (CESI), Comprehensive Talent Base Index (CTPI), HostBoost, Competitor Strength Index (ACI), Strategic Focus Index (SFI), and Principal Component of Parameter Items (Breadth_PC), and the data for 2024 were calculated to predict the distribution of medals for all countries in 2028.

团队 # 1111111 22 页,共 11

随机森林回归模型
Random forest regression model

时间序列 CV MSE 均值
Time series CV MSE mean

时间序列 CV MSE 标准
Time Series CV MSE Standard

残差正态性
Residual normality

p 值
p-value

测试集
Test set

0.6748

0.7383

0.0000

训练集
Training set

0.4653

0.5229

0.0000

表3- 基于随机森林回归模型的奖牌榜预测结果检验
Table 3- Test of medal table prediction results based on random forest regression model

1-2028奥运会各国家奖牌分布
Figure 1 - Distribution of medals by country at the 2028 Olympic Games

图1为绘制的散点图矩阵(Pair Plot),展示了金、银、铜、总和,多个变量之间的两两关系,主对角线上显示的是每个变量的核密度估计图,反映了金牌数、银牌数、铜牌数和总奖牌数各自的分布。非主对角线上的子图展示了不同变量之间的关系通过散点的分布能够直观地看出两者之间是否存在某种关联(如正相关、负相关或者无明显关联)。
Figure 1 shows the Pair Plot of Gold, Silver, Copper, and Sum, and the relationship between multiple variables, with the kernel density estimation of each variable on the main diagonal, reflecting the respective distributions of the number of gold, silver, bronze, and total medals. The subgraph on the non-primary diagonal shows the relationship between different variables, and the distribution of scatters can be used to visually see whether there is some correlation between the two (such as positive correlation, negative correlation, or no obvious correlation).

但为了加强预测奖牌数量的说服力,所以还进行了历届参加奥运会比赛获奖的男女比例的分析,如图3所示:
However, in order to make the prediction of the number of medals more convincing, an analysis of the proportion of men and women who have won medals in previous Olympic competitions has also been analyzed, as shown in Figure 3:

图 3-不同性别参赛获奖
Figure 3 - Gender Awards

基于上述的分析,最终得到美国2028年的金牌与银牌数量预测,输出结果如图3
Based on the above analysis, the final forecast for the number of gold and silver medals in the United States in 2028 is obtained, and the output results are shown in Figure 3

图 3-美国2028年的金牌与银牌数量预测
Figure 3 - U.S. Gold and Silver Medal Forecast for 2028

2.国家获得奖牌进退步情况
2. The progress and regression of the country in winning medals

定义进退步的基准对比同一国家不同届次
Benchmark for defining progress and regression: Compare different sessions in the same country

如果2028年的金牌数大于2024年, 那么认为该国家表现有进步,反之则认为该国家表现退步,公式如下
If the number of gold medals in 2028 is greater than in 2024, then the country's performance is considered to be improving, and vice versa, the country's performance is considered to be regressive, and the formula is as follows

黄金兑换=预测黄金

()

其中。黄金兑换为国家i在2028年预测的金牌数与2024年金牌数之间的差值,预测黄金为国家i预测的2028年金牌数,为国家i在2024年实际获得的金牌数。
Thereinto. 黄金兑换 is the difference between the number of gold medals predicted by country i in 2028 and the number of gold medals in 2024, the 预测黄金 number of gold medals predicted by country i in 2028, and the number of gold medals actually won by country i in 2024.

进展={如果为 “Progress”黄金兑换>0

()

最终的输出结果如图4所示,图4更加直观展示了不同国家在2024年后的退步情况以及进步情况,蓝色表示该国家进步,橙色则表明退步
The final output is shown in Figure 4, which provides a more visual picture of the regression and progress of different countries after 2024, with blue indicating the country's progress and orange indicating regression

图 4 - 不同国家在2024年后的退步情况以及进步情况
Figure 4 - Regression and progress of different countries after 2024

分析结果可知,共150个国家,其中有103个国家2028年预测金牌数大于2024年(例如冈比亚、墨西哥、科摩罗),有47个退步国家(例如哥伦比亚、伊朗、荷兰 )
According to the analysis results, a total of 150 countries have 103 countries predicted to have more gold medals in 2028 than in 2024 (e.g., Gambia, Mexico, Comoros) and 47 countries (e.g., Colombia, Iran, the Netherlands)

未获奖国家的获奖概率预测模型
A prediction model for the probability of winning a prize for a country that has not won a prize

可以延用随机森林回归与交叉验证优化模型,数据已经在之前模型准备合并完成。
Random forest regression and cross-validation optimization models can be used to combine the data before the model is ready to be combined.

在上述模型的基础上,预测它们是否有可能在2028年首次获得奖牌。通过基于经济学与体育学理论构建6项核心特征:综合经济实力指数(CESI)、综合人才基数指数(CTPI)、动态东道主效应(HostBoost)、参赛选手实力指数(ACI)、战略聚焦指数(SFI)、参数项目主成分(Breadth_PC),用预测的奖牌数来估计是否有可能获得首枚奖牌。
On the basis of the above models, it is predicted whether they are likely to win medals for the first time in 2028. Based on the theories of economics and sports, six core characteristics are constructed: Comprehensive Economic Strength Index (CESI), Comprehensive Talent Base Index (CTPI), Dynamic Host Effect (HostBoost), Competitor Strength Index (ACI), Strategic Focus Index (SFI), and Principal Component of Parameter Items (Breadth_PC).The number of medals is used to estimate whether it is possible to win the first medal.

若预测总奖牌数大于0,认为该国家在2028年能获得首枚奖牌公式如下:
If the total number of medals is predicted to be greater than 0, the country is considered to win its first medal in 2028, and the formula is as follows:

使用回归模型计算国家i在2028年的总奖牌
Use the regression model to calculate the total number of medals for country i in 2028

预测黄金=f(CESI,CTPI,HostBoost 主机提升,国际机场,SFI,Breadth_PC)

()

2)基于预测的金牌数判断是否获得首枚奖牌:
2) Determine whether to win the first medal based on the predicted number of gold medals:

第一枚奖牌={如果为 1预测黄金>0如果为 0预测黄金≤0

()

未获奖国家的获奖概率预测结果
Prediction of the probability of winning a prize for a country that has not won a prize

图 5-获得首枚奖牌国家预测分布
Figure 5 - Projected distribution of countries with their first medals

比赛项目与各国奖牌数量之间的关系研究
A study of the relationship between competition events and the number of medals in each country

该问题旨在分析每个国家在不同赛事中的表现,并确定不同国家最重要的运动。这可以通过计算金、银、铜奖牌数量、奖牌比率以及奖牌分数,进而得出每个国家在各个赛事中的表现,并为每个国家找出最重要的赛事
The question aims to analyze the performance of each country in different events and to identify the most important sports in different countries. This can be done by calculating the number of gold, silver and bronze medals, the medal ratio, and the medal points, which can be used to determine how each country has performed in each event and to identify the most important events for each country.

1.计算金、银、铜奖牌数
1. Count the number of gold, silver and bronze medals

根据summerOly_athletes.csvMEDAL列(包含奖牌类型:Gold,Silver,Bronze,青铜)通过将MEDAL列的每个值与对应的奖牌类型比较,并为每种类型代予相应的值(1或0)计算每个奖项总奖牌的数量公式如下:
Each award is calculated based on the MEDAL column in the summerOly_athletes.csv (including medal types: Gold, Silver, Bronze, etc.) 青铜 by comparing each value of the MEDAL column to the corresponding medal type and assigning a corresponding value (1 or 0) to each typeand the number of total medals, the formula is as follows:

黄金=MEDAL='金'

银卡=MEDAL='银'

铜牌=MEDAL='铜'

总计 = 金牌 + 斯里兰卡 + 铜牌

()

2.汇总奖牌数
2. Summarize the number of medals

按NOC(国家代码)和Event(赛事名称)进行汇总,计算每个国家在每个赛事中的金,银,铜奖牌数以及总奖牌数公式如下:
Aggregate by NOC (country code) and event (event name) to calculate the number of gold, silver, bronze medals and total medals of each country in each event, the formula is as follows:

=NOC i 和项目 i 金牌

=NOC i 和项目 i 获得银牌

青铜=NOC i 和项目 i 铜牌

=++青铜

()

计算奖牌比率
Calculate the medal ratio

对于每个国家在每个项目中的金、银、铜奖牌数量,计算它们相对于总奖牌数的比率
For the number of gold, silver, and bronze medals each country has in each event, calculate their ratio relative to the total number of medals

黄金比例=

白银比例=

青铜比率=青铜

()

计算综合分数
The composite score is calculated

计算完比率之后,为每个赛事计算金、银、铜奖牌的综合分数每种奖牌的分数是奖牌数与相应比率的乘积。
Once the ratios have been calculated, the combined score of the gold, silver and bronze medals is calculated for each event, and the score for each medal is the product of the number of medals and the corresponding ratio.

黄金评分=×黄金比例

银奖=×白银比例

铜牌=青铜×青铜比率

()

5.不同项目的奖牌分布
5. Medal distribution in different events

因为要探讨的是每个比赛项目与奖牌数的关系,这里对于每个国家采用加权比率算法,基于其在各赛事中的综合得分(包括金、银、铜的综合分数),如图:
Because the relationship between each event and the number of medals is to be discussed, a weighted ratio algorithm is used for each country, based on its combined score in each event (including gold, silver, and bronze composite scores), as shown in Fig

图 6总奖牌在不同项目的分布情况
Figure 6: Distribution of total medals in different events

东道主选择对比赛项目结果的影响
The impact of the host country's choice on the outcome of the event

1.东道主选择比赛项目结果影响的分析
1. Analysis of the impact of host selection on the outcome of the competition

该问题需要探讨东道主选择对比赛项目结果的影响,通过如图6所示的一个SHAP图像,表示特征对模型输出的影响程度。SHAP值越大(右侧的色区域),说明该特征对模型的预测结果影响越大,反之亦然。
This problem needs to explore the influence of host selection on the outcome of the competition event, and the degree of influence of features on the output of the model is represented by a SHAP image as shown in Figure 6. The larger the SHAP value (the red area on the right), the greater the influence of the feature on the model's predictions, and vice versa.

6 - SHAP 值(对输出的影响)
Figure 6 - SHAP value (effect on output).

东道主选择比赛项目结果影响的模型建立
The host chooses a model that affects the outcome of the event

1)得奖数的汇总公式:
1) Formula for summarizing the number of prizes:

对于每个国家和每个运动项目,计算金,银,铜奖牌数的总和,并计算总奖牌数:
For each country and each sport, calculate the sum of the number of gold, silver, bronze medals, and calculate the total number of medals:

=++青铜

()

计算东道主与非东道主国家的平均奖牌数:
Calculate the average number of medals for host and non-host countries:

=MEDALE 奖章主机国家数

=MEDALE 奖章非主机国家数量

()

3)根据奖牌数对每个国家在各个项目中的表现进行排序,得到最重要的项目:
3) Sort each country's performance in each event according to the number of medals to get the most important items:

分析完成后,生成包含每个国家在每个运动项目中的奖牌数。比较东道主与非东道主国家的奖牌数。按国家和奖牌数排序,列出每个国家最重要的赛事。可视化效果如图3所示
Once the analysis is complete, a generation contains the number of medals for each country in each sport. Compare the number of medals of the host country with that of a non-host country. Sort by country and medal count, listing the most important events for each country. The visualization is shown in Figure 3

图 6 - 不同国家获得的奖牌数量
Figure 6 - Number of medals won by different countries

从CESI、CTPI、HostBoost、ACI、SFI、Breadth_PC对奖牌数影响要素中,图8能够看到HostBoost也会对选择对比赛项目结果的影响 重试    错误原因

图 6 - HostBoost 选择对竞赛计划结果的影响
Figure 6 - The Effect of HostBoost Selection on the Contest Program Results

问题二
Question two

问题二分析
Problem 2: Analysis

针对问题2,选择了三个国家(中国CHN、美国美国和罗马尼亚ROU),检测“伟大教练效应”:验证教练是否显著影响国家在特定项目的奖牌数并量化效应贡献估计教练效应对奖牌数的提升幅度,然后给出策略建议。
For question 2, three countries (China, CHN, USA, and Romania ROU) were selected to test the "great coaching effect": to verify whether coaches significantly affect the number of medals in a specific event, to quantify the contribution of the effect, to estimate the increase in the number of medals by the coaching effect, and then to give strategic recommendations.

思路一:通过加权奖牌分-正则化方法观察教练更换前后的奖牌变化来量化教练效应。对于中美女排数据,采用了用加权的方法,把金银铜牌赋予不同的权重进行标准化处理(不同奥运会的奖牌总数不同,直接加权求和可能会有偏差,当某届奥运会增加了项目,奖牌总数更多时总权重可能自然上升,但这不一定是因为教练效应)最后预测每届每个国家的奖牌总权重。因为特征较多且相关,使用岭回归,最终来检查数据中是否存在可能的优秀教练效应,对于美国和罗马尼亚体操数据,改用LASSO(L1正则化) 重试    错误原因

思路二:“通过教练更换对奖牌的影响进行建模”,构建面板回归模型,控制国家-项目固定效应与时变因素。核心变量是教练更换标记(CoachChange),通过回归系数β来量化教练效应对奖牌数的贡献。假设“伟大教练效应”体现为某国某项目奖牌数的非连续性增长,结合已知案例(如郎平、贝拉·卡罗里)验证假设。
Idea 2: "Model the impact of coach change on medals", construct a panel regression model, and control the fixed effect and time-varying factors of the country-project. The core variable is the CoachChange marker, which quantifies the contribution of the coaching effect to the number of medals through the regression coefficient β. It is assumed that the "great coaching effect" is reflected in the non-continuous increase in the number of medals in a certain event in a certain country, and the hypothesis is verified by combining known cases (such as Lang Ping and Bella Caroli).

思路三:将面板回归的控制能力与加权奖牌分-正则化方法结合,形成多阶段混合模型模型结构中在面板回归框架中引入加权奖牌分数作为因变量,并结合正则化方法处理高维数据或共线性问题;教练效应量化确保在结合后的模型中,教练更换的效应(β系数)能够准确反映其对加权奖牌分的影响,同时控制其他混杂变量。最后设计交叉验证和统计检验策略,验证结合后模型的稳定性和预测能力。 重试    错误原因

问题二模型建立 重试    错误原因

一、加权奖牌分-正则化方法 重试    错误原因

1.数据准备与预处理 重试    错误原因

得分=3×G+2×S+1×B

() 重试    错误原因

奖牌权重计算:赋予金银铜牌权重系数(金=3,银=2,铜=1),计算每国每届奥运会的加权奖牌总分: 重试    错误原因
标准化处理:按届次调整总分(
=得分该届平均总分),消除规模差异。 重试    错误原因

教练更换标记:基于公开报道准确标记教练更换时间,构建二元变量教练变更(更换后1,否则0)。
Coach change marking: Based on the accurate marking of coach change time based on public reports, a binary variable 教练变更 is constructed (1 after replacement, 0 otherwise).

滞后效应处理:考虑教练上任后1-2年的影响(如教练变更)
Lag effect treatment: Consider the impact of 1-2 years after the coach takes office (e.g 教练变更 ., ).

2. 中美女排案例:加权岭回归模型 重试    错误原因

得分=β+β教练变更+θX

()

特征:运动员人数、历史成绩滑动均值等。
Characteristics: Number of athletes, sliding average of historical results, etc.

正则化:采用岭回归(L2正则化),惩罚项为λ,防止过拟合。
Regularization: Ridge regression (L2 regularization) is used, and the penalty term is λ to prevent overfitting.

3. 美罗体操案例:LASSO回归
3. The case of Merro Gymnastics: LASSO returns

得分=β+β教练变更+θX

()

特征:教练任期、运动员年龄结构、国际赛事表现等。
Characteristics: tenure of coaches, age structure of athletes, performance in international competitions, etc.

正则化:采用LASSO(L1正则化),惩罚项为λ|β|自动筛选关键变量。
Regularization: LASSO (L1 regularization) is used, and the penalty term is λ|β| automatic screening of key variables.

二、面板回归模型
2. Panel regression model

采用面板回归模型,控制国家-项目固定效应与时变因素:
The panel regression model was used to control for the fixed effect of country-project and the time-varying factors:

奖牌=+γ+β教练变更+θX

国家-项目固定效应γ年份固定效应。
: Country-project fixed effect, γ : Year fixed effect.

()

三、多阶段混合模型
3. Multi-stage hybrid model

1. 面板回归核心结构
1. The panel returns to the core structure

=+γ+β教练效果+θX

()

教练效果:教练效应变量(如教练变更Coach 任期教练任期年数)
教练效果 : Coaching effect variables (e.g., 教练变更 or Coach 任期 number of years of coaching).

2. 正则化集成(弹性网络)
2. Regularization Integration (Elastic Network)

分钟={()+λ(α|β|+(1-α))}

()

弹性网络(L1+L2正则化)平衡变量选择与模型稳定性。
Elastic network (L1+L2 regularization) balances variable selection and model stability.

通过交叉验证优化λ(正则化强度)和α(L1/L2混合比例)。
Optimization λ by cross-validation (regularization intensity) and α (L1/L2 mix ratio).

问题二模型求解及检验与评估
Problem 2: Model solving, testing and evaluation

表4用多阶段混合模型计算出名师效应对奖牌的影响
Table 4 uses a multi-stage mixed model to calculate the influence of the famous teacher effect on the medals

相互关联性
Interrelatedness

影响效果
Affect the effect

一个djustRTarget
A djust R (Target

>0.7)

小微电子

C高效
C efficient

P 值
P-value

LP--中国

7.33

0.00

20.31

+3.21

0.1585

LP--美国

7.75

0.07

82.51

+1.89

0.0569

Perticaroli--RO

5.61

-0.04

37.93

+0.97

0.3442

Perticaroli--美国
Perticaroli - United States

4.20

-0.07

122.84

+2.15

0.5859

表3- 名师效应对奖牌数量的影响
Table 3 - Effect of the Teacher Effect on the Number of Medals

从结果可以得出一系列的数据表格,如下:
From the results, a series of data tables can be derived, as follows:

序号
serial number


year

中国排球
Chinese volleyball

美国排球
American Volleyball

罗马尼亚体操
Romanian gymnastics

美国体操
American Gymnastics

郎平(中国)
Lang Ping (China)

郎平(美国)
Lang Ping (USA)

贝拉卡罗利 (罗马尼亚)
Bella Caroli (Romania)

贝拉卡罗利(美国)
Bella Caroli (USA)

0

1984

10

24

13

16

0

0

1

0

1

1988

11

12

14

1

0

0

1

0

2

1992

0

24

10

6

0

0

0

0

3

1996

9

0

15

6

0

0

0

0

4

2000

0

0

11

3

0

0

0

0

5

2004

12

0

20

8

1

0

0

1

6

2008

12

24

7

11

0

1

0

1

7

2012

0

12

7

3

0

1

0

1

8

2016

12

24

0

4

0

1

0

0

9

2020

0

12

0

5

0

0

0

0

10

2024

0

26

1

3

0

0

0

0

表5- 不同年份下有无名师指导的奖牌数
Table 5 - Number of medals in different years with or without mentorship

数据显示,1984年至2024年间,中国、美国和罗马尼亚在排球和体操项目上获得的奖牌数量,同时反映了郎平与贝拉卡罗利在不同国家的执教情况表中用“1”和“0”来表示各国是否有这些教练的指导)。美国在排球项目上的表现整体较好,特别是在1984年、1988年、1992年和2008年。中国在2004年和2016年的排球项目上也取得了不错的成绩。郎平在中国的执教似乎与中国排球较好的表现有一定的相关性,而贝拉卡罗利在美国的执教期间,美国排球获得了一定数量的奖牌。
The data shows the number of medals won by China, the United States and Romania in volleyball and gymnastics between 1984 and 2024, and reflects the coaching of Lang Ping and Bella Caroli in different countries ("1" and "0" are used in the table to indicate whether these coaches are available in each country). The U.S. has generally performed better in volleyball, especially in 1984, 1988, 1992 and 2008. China also achieved good results in volleyball in 2004 and 2016. Lang Ping's coaching in China seems to have a certain correlation with the good performance of Chinese volleyball, which won a certain number of medals during Bella Caroli's tenure in the United States.

图分别展示了1984-2024年中国排球、1964-2024年美国排球、1952-2012年罗马尼亚体操和1904-2024年美国体操的奖牌数量变化趋势。表中数据与图中趋势相互对应,直观展示了各国奖牌数量的波动和教练指导的关联。
The chart below shows the trend of the number of medals in Chinese volleyball from 1984 to 2024, American volleyball from 1964 to 2024, Romanian gymnastics from 1952 to 2012, and American gymnastics from 1904 to 2024. The data in the table corresponds to the trends in the chart, which visually shows the correlation between the fluctuations in the number of medals in each country and the coaching guidance.

图 6 - “1984 年至 2024 年中国、美国和罗马尼亚排球和体操奖牌趋势”
Figure 6 - "Volleyball and Gymnastics Medal Trends in China, USA and Romania from 1984 to 2024".

问题三
Question three

奖牌分布存在显著差异
There are significant differences in the distribution of medals

我们已经建立了奖牌预测模型(问题一),分析了伟大教练效应(问题二),现在需要进一步挖掘模型中的关于奥运奖牌数量其他发现。回顾之前提到的模型和见解,通过数据来检验不同地区的国家在奖牌分布是否存在显著差异。
Now that we have built a medal prediction model (Question 1) and analyzed the Great Coach Effect (Question 2), we need to dig deeper into the model for other findings on the number of Olympic medals. Review the previously mentioned models and insights to examine whether there are significant differences in the distribution of medals between countries in different regions.

不同国家的奖牌数量分布树状图
Treemap of the number of medals distributed by different countries

树状图以矩形区域的嵌套形式展示数据的层次结构,每个国家对应一个矩形区域,区域大小表示该国获得的总奖牌数,根据每个国家的总奖牌数为其对应的区域分配不同颜色,颜色从浅蓝到深红色逐渐变化,奖牌数越多颜色越深。
The treemap shows the hierarchy of data in the form of nested rectangular regions, each country corresponds to a rectangular area, the size of the region represents the total number of medals won by the country, and each country assigns different colors to its corresponding regions according to the total number of medals, the color gradually changes from light blue to dark red, the more medals, the darker the color.

传统优势项目的地域集中性
Geographical concentration of traditional advantageous projects

地域
regional

代表国家
Representing the country

优势项目
Advantageous projects

关键说明
Key Notes:

东亚
Eastern Asia

中国
China

日本
Japan

韩国
Korea

乒乓球
ping pong

羽毛球
badminton

体操
gymnastics

占全球奖牌的60%以上(技巧型项目)
It accounts for more than 60% of the global medals (skill events).

欧洲
Europe

英国
United Kingdom

德国
Germany

法国
France

自行车
bicycle

赛艇
Rowing

马术
equestrianism

与历史体育文化和设施投入密切相关
It is closely related to the investment in history, sports, culture and facilities

美洲
Americas

美国
United States

古巴
Cuba

巴西
Brazil

田径
track and field

游泳
swim

篮球
basketball

美国在游泳和田径上的奖牌占比达30%(体能型)
The U.S. has 30 percent of its medals in swimming and athletics (fitness).

非洲
Africa

肯尼亚
Kenya

埃塞俄比亚
Ethiopia

中长跑
Middle-distance running

肯尼亚在马拉松项目中获得近40%奥运奖牌
Kenya won nearly 40 percent of its Olympic medals in marathon events

地理与气候的影响
Geographical and climatic influences

地理/气候类型
Geographical/climatic type

代表国家
Representing the country

优势项目
Advantageous projects

关键说明
Key Notes:

热带国家
Tropical countries

牙买加
Jamaica

巴西
Brazil

短跑
dash

沙滩排球
Beach volleyball

气候适应性和训练条件相关
Climate adaptation is related to training conditions

高海拔国家
High-altitude countries

埃塞俄比亚
Ethiopia

哥伦比亚
Colombia

中长跑
Middle-distance running

红细胞携氧能力更强,具有天然生理优势
Red blood cells have a stronger oxygen-carrying capacity and have a natural physiological advantage

地域差异的显著表现表
Significant manifestations of regional differences

地域文化、历史投入与项目特性共同塑造了各国的优势领域;自然条件(如气候、海拔)通过生理适应性和训练环境间接影响竞技表现。
Regional culture, historical investment and project characteristics have jointly shaped the areas of strength of each country; Natural conditions (e.g., climate, altitude) indirectly affect competitive performance through physiological adaptation and training environment.

对奥委会的战略建议
Strategic recommendations to the Olympic Committee

基于地域差异的针对性策略
Targeted strategies based on geographical differences

地域
regional

代表国家
Representing the country

核心策略
Core Strategy

具体措施与示例
Specific measures and examples

东亚
Eastern Asia

中国
China

日本
Japan

韩国
Korea

巩固传统优势
Consolidate traditional strengths

拓展新兴领域
Expand into emerging areas

-保持乒乓、体操投入,应对规则变化
- Maintain the input of table tennis and gymnastics and respond to changes in the rules

(如器材限制)
(e.g. equipment restrictions)

-布局电竞、滑板等新增项目,结合科技优势
- Layout of new projects such as e-sports and skateboarding, combined with scientific and technological advantages

欧洲
Europe

英国
United Kingdom

德国
Germany

法国
France

强化耐力项目
Intensive endurance program

区域合作
Regional cooperation

-投资自行车、赛艇青训体系
-建立跨国训练中心(如北欧滑雪联盟模式),共享资源
- Establishment of cross-border training centers (e.g. the Nordic Ski Union model) and sharing of resources

- Invest in the youth training system for cycling and rowing

美洲
Americas

美国
United States

巴西
Brazil

古巴
Cuba

优化体能项目
Optimize your fitness program

文化输出
Cultural export

-数据科学提升田径、游泳效率
- Data science to improve the efficiency of athletics and swimming

(如美国游骑兵计划”)
(Like the American "Ranger Program").
-推广篮球、冲浪全球影响力,吸引人才
- Promote the global influence of basketball and surfing to attract talents

非洲
Africa

肯尼亚
Kenya

南非
South Africa

聚焦长跑生态链
Focus on the long-distance running ecological chain

突破冷门
Break through the underdogs

-打造长跑经济圈
- Create a "long-distance economic circle".

(如埃塞俄比亚跑步小镇)
(e.g. Ethiopian Running Town)

-引入外籍教练拓展游泳、射击等潜力项目
- Introduce foreign coaches to expand potential projects such as swimming and shooting

东亚:平衡传统项目与新兴技术领域,规避单一化风险。欧洲:通过区域合作降低资源重复投入,提升耐力项目统治力。美洲:以文化输出扩大项目影响力,结合数据驱动优化训练。非洲:聚焦长跑产业链,同时通过国际合作突破冷门项目瓶颈。大洋洲:利用独特地理条件建立差异化优势,通过联盟弥补人口劣势。
East Asia: Balance traditional projects with emerging technologies and avoid the risk of simplification. Europe: Regional cooperation to reduce duplication of resources and increase the dominance of endurance programs. Americas: Expand the impact of the project with cultural output, combined with data-driven optimization training. Africa: Focus on the long-distance running industry chain, and at the same time break through the bottleneck of unpopular projects through international cooperation. Oceania: Leverage unique geographical conditions to create differentiation and compensate for demographic disadvantages through alliances.

2.地域差异的动态调整
2. Dynamic adjustment of regional differences

模型还揭示地域优势并非固定不变,需动态适应以下趋势:
The model also reveals that geographical advantage is not fixed and needs to dynamically adapt to the following trends:

全球化削弱传统壁垒:非洲短跑运动员(如尼日利亚)通过欧美训练体系崛起,挑战牙买加、美国的传统优势。气候变化的影响:高温地区国家在耐力项目中的适应性优势可能增强(如中东国家未来在马拉松项目的潜力)。政策与投资转移:“一带一路”国家(如东南亚)通过中国援助提升体操、跳水设施,可能改变亚洲内部竞争格局。
Globalization weakens traditional barriers: African sprinters, such as Nigeria, have risen to challenge the traditional strengths of Jamaica and the United States through the European and American training systems. Impacts of climate change: Countries in high-temperature regions are likely to have an increased adaptive advantage in endurance events (e.g., the potential of Middle Eastern countries in future marathon events). Policy and investment transfers: Belt and Road countries (e.g., Southeast Asia) are upgrading gymnastics and diving facilities through Chinese assistance, which could change the competitive landscape within Asia.

3.总结
3. Summary

地域差异显著影响奥运奖牌分布,但通过数据驱动的战略调整,各国奥委会可最大化自身优势并弥补短板。例如:东亚国家需平衡传统项目与新兴领域,避免“单一化陷阱”。非洲国家应利用长跑优势的同时,探索多元化路径。小国与岛国可通过区域合作与国际援助,突破资源限制。最终,地域差异既是挑战也是机遇,精准分析与灵活策略将成为未来奥运竞争的核心。
Geographical differences have a significant impact on the distribution of Olympic medals, but through data-driven strategic adjustments, NOCs can maximise their strengths and make up for their shortcomings. For example, East Asian countries need to balance traditional projects with emerging areas to avoid the "singularity trap". African countries should take advantage of the advantages of long-distance running and explore diversified paths. Small and island countries can overcome resource constraints through regional cooperation and international assistance. Ultimately, regional differences are both challenges and opportunities, and accurate analysis and flexible strategies will be at the heart of future Olympic competition.

模型评价与推广
Model evaluation and promotion

优势
advantage

针对问题1,随机森林模型在数据准备与特征处理保留原有数据收集与特征工程步骤(处理缺失值、编码分类变量、标准化等)。针对时间序列特性(如奥运年份),设计时序交叉验证策略(如逐年滚动窗口),避免未来数据泄露至训练集。在随机森林框架内,预设超参数调优范围(如树数量、深度),并明确交叉验证将直接嵌入超参数选择流程。
For Problem 1, the random forest model retains the original data collection and feature engineering steps (processing missing values, coding categorical variables, normalization, etc.) in data preparation and feature processing. For time series characteristics (such as Olympic years), design time series cross-validation strategies (such as year-by-year rolling windows) to prevent future data leakage to the training set. Within the random forest framework, preset hyperparameter tuning ranges (e.g., number of trees, depth) and make it clear that cross-validation will be directly embedded in the hyperparameter selection process.

针对问题2,思路一:加权奖牌分-正则化方法,岭回归适合处理特征多重共线性问题,通过L2正则化来减少模型的复杂度避免过拟合。LASSO可以进行特征选择,自动去除不重要的变量。思路二:面板回归模型,面板数据结合了时间序列和横截面数据,能控制个体异质性,排除不随时间变化的混淆变量;时间固定效应可以捕捉全局趋势,比如技术进步或奥运会规模扩大带来的整体奖牌数增加。思路三:多阶段混合模型结合模型可以控制混杂因素面板固定效应过滤国家-项目特异性干扰),保障灵活性与稳定性弹性网络适应高维数据,避免过拟合),具有可解释性SHAP值分解提供直观的贡献度分析),利用面板回归的控制优势(固定效应)和正则化方法的变量选择/抗过拟合优势,提升整体模型性能。
Aiming at problem 2, idea 1: weighted medal score-regularization method, ridge regression is suitable for dealing with feature multicollinearity problems, and L2 regularization is used to reduce the complexity of the model and avoid overfitting. LASSO performs feature selection and automatically removes unimportant variables. Idea 2: Panel regression model, panel data combines time series and cross-sectional data, which can control individual heterogeneity and exclude confounding variables that do not change with time; The time-fixing effect captures global trends, such as technological advancements or an increase in the overall medal count associated with an Olympic scale. Idea 3: Multi-stage hybrid model. Combined with the model, it can control the confounding factors (panel fixed effect filters country-specific interference), ensure flexibility and stability (the elastic network adapts to high-dimensional data and avoids overfitting), With interpretability (SHAP value decomposition provides intuitive contribution analysis), the control advantage of panel regression (fixed effect) and the variable selection/anti-overfitting advantage of regularization method are used to improve the overall model performance.

不足和改进
Deficiencies and improvements

在奥运会奖牌预测模型中,尤其是针对“首枚牌”的预测模型若在预测结果中没有显示某些国家有可能获得首枚金牌,这其中存在多种原因由于题目仅让使用给出的数据集,对于那些在历史上未曾获得奖牌或仅获得少量奖牌中的国家,模型很可能由于缺乏足够的训练数据而难以预测它们能够在未来的奥运会上突破并获得牌。因此对于“新兴”国家,或那些在奥运会中表现长期较弱的国家,模型可能无法准确识别其未来的潜力。在某些国家,出于历史数据的缺乏(例如仅参与少数几届奥运会,或在某些年份的参赛人数和赛事较少
In Olympic medal prediction models, especially for "first medals", if the prediction results do not show that some countries are likely to win their first gold medal, there are a number of reasons for this, because the problem only allows the use of the given data set, for those who have not won a medal in historyor countries that have won only a few medals, and it is likely that the model will have difficulty predicting that they will be able to break through and win medals at future Olympics due to a lack of sufficient training data. As a result, for "emerging" countries, or those that have historically underperformed at the Olympics, the model may not be able to accurately identify their future potential. In some countries, due to a lack of historical data (e.g. participation in only a few Olympic Games, or low numbers of participants and events in some years.

针对“教练更换对奖牌影响”的面板回归模型,可能会忽略滞后效应,教练效应可能需要多年积累(如青训体系改革),但模型仅捕捉当期影响。
The panel regression model for the "impact of coach change on medals" may ignore the lag effect, which may need to accumulate over many years (e.g., youth system reform), but the model only captures the current impact.

参考文献
References

薛雨蒙,杨文革.冬奥会主场优势影响因素及建议[J].中国体育教练员,2022,30(01):28-30+63.DOI:10.16784/j.cnki.csc.2022.01.004.
Xue Yumeng,Yang Wenge. China Sports Coaches,2022,30(01):28-30+63.DOI:10.16784/j.cnki.csc.2022.01.004.

FU Qunchao, Zenng Hydro Shen, CHEN Pei, et al.基于随机森林回归模型的钻井作业周期预测[J].测井工程, 2024, 35 (04): 39-47.
FU Qunchao, Zenng Hydro Shen, CHEN Pei, et al.Drilling operation cycle prediction based on random forest regression model[J].Well Logging Engineering, 2024, 35 (04): 39-47.

王佳军.基于线性回归模型的铁路客运量预测与实证分析[J].智能轨道交通, 2024, 61 (03): 102-105+114.
WANG Jiajun. Intelligent Rail Transit, 2024, 61 (03): 102-105+114.

Craven, BD 和 Sardar MN Islam。“普通最小二乘回归。”SAGE 定量管理研究词典 1 (2011):224-228。
Craven, BD and Sardar MN Islam. "Ordinary Least Squares Regression." SAGE Quantitative Management Research Dictionary 1 (2011): 224-228.

比斯塔,里沙夫。“重新审视奥林匹克效应。”国际经济学评论 25.2 (2017):279-291。
Bista, Ryshav. "Revisiting the Olympic Effect." International Economic Review 25.2 (2017): 279-291.

莫雷诺、辛瓦尔多·罗德里格斯、薇薇安娜·科科·马里亚尼和莱安德罗·多斯桑托斯·科埃略。“将参数模型的混合多阶段分解应用于巴西东北部的风速预报。”可再生能源 164 (2021):1508-1526。
Moreno, Cinnardo Rodríguez, Viviana Coco Mariani and Leandro dos Santos Coelho. "Applying a Hybrid Multistage Decomposition of Parametric Models to Wind Speed Forecasts in Northeast Brazil." Renewables 164 (2021): 1508-1526.

Nguyen、Thành 和 Milan Vojnovic。“加权比例分配。”ACM SIGMETRICS 性能评估评论 39.1 (2011):133-144。
Nguyen, Thành and Milan Vojnovic. "Weighted proportional distribution." ACM SIGMETRICS Performance Evaluation Review 39.1 (2011): 133-144.

套索:
Lasso:

罗斯,沃尔克。“广义的 LASSO。”IEEE 神经网络汇刊 15.1 (2004):16-28。
Rose, Volcker. "LASSO in the broadest sense of the word." IEEE Transactions on Neural Networks 15.1 (2004): 16-28.

Abdar, Moloud, et al. “深度学习中的不确定性量化综述:技术、应用和挑战。”信息融合 76 (2021):243-297。
Abdar, Moloud, et al. "A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications, and Challenges." Information Fusion 76 (2021): 243-297.

Christopher Frey, H. 和 Sumeet R. Patil。“敏感性分析方法的识别和审查。”风险分析 22.3 (2002):553-578。
Christopher Frey, H., and Sumeet R. Patil. "Identification and Review of Sensitivity Analysis Methods." Risk Analysis 22.3 (2002): 553-578.

布伊森,弗雷德里克。“发展综合指数的概述和评估。”社会指标研究 59 (2002):115-151。
Bouyson, Frederick. "Overview and Assessment of the Development Composite Index." Social Indicators Res 59 (2002): 115-151.

附录
appendix

经济实力代理指标公式体系-GDP替代特征构建
Proxy index formula system of economic strength - construction of GDP substitution characteristics

1.加权奖牌值(Weighted Medal Value, WMV):
1. Weighted Medal Value (WMV):

WMV (英语)=w

()

权重:w=3w=2w=1
Weights: w=3w=2w=1

历史窗口选择:设当前年份为t,使用过去k届数据(k=3)
Historical window selection: Set the current year to t, and use the data of the past k years (k=3).

2.核心代理指标公式
2. Core Proxy Indicator Formula

体育投入动量(Sports Investment Momentum, SIM)
Sports Investment Momentum (SIM)

SIM 卡=1kλWMV (英语),λ=0.8为衰减因子

()

标准化处理:
Standardized Processing:

=SIM 卡−min(SIM 卡)最大(SIM 卡)−min(SIM 卡)

()

经济解释:反映国家it年前k届的持续体育投入;指数衰减体现“近期投入更重要”的假设
Economic explanation: reflects the assumption that the continuous sports investment index of the country i in the first k years of t reflects the assumption that "near-term investment is more important".

原文如此=1−(pp)S(i在项目在项目S,S为当届总项目数

()

项目投资集中度(Sport Investment Concentration, SIC)
Sport Investment Concentration (SIC)

经济解释:取值0-1,值越大表明资源集中度越高;反映“重点突破”或“全面布局”战略
Economic explanation: The value is 0-1, the higher the value, the higher the concentration of resources, reflecting the strategy of "key breakthrough" or "comprehensive layout".

奖牌获取效率(Medal Efficiency Ratio, MER)
Medal Efficiency Ratio (MER)

MER=WMV (英语)一个E一个:国家t年参赛运动员数)

()

经济解释:衡量单位资源投入的奖牌产出效率;分母设计体现规模报酬递减假设
Economic explanation: The design of the denominator of the output efficiency of the medal to measure the unit of resource input reflects the assumption of diminishing returns to scale

3.综合经济实力指数(Composite Economic Strength Index, CESI)
3. Composite Economic Strength Index (CESI).

CESI=原文如此MER

()

系数通过主成分分析确定:
The coefficients were determined by principal component analysis:

[α,β,γ]=主成分分析([SIM、SIC、MER])

()

4.动态调整机制
4. Dynamic adjustment mechanism

衰减因子自适应调整:
Adaptive adjustment of attenuation factor:

λ=0.7+0.3sigmoid (WMV (英语))WMV (英语)=WMV (英语)WMV (英语)WMV (英语)

()

人才基数代理指标公式体系-人口替代特征构建
Talent base agent index formula system-population substitution characteristics construction

1.运动员参与广度(Athlete Participation Wideth, APB)
1. Athlete Participation Wideth (APB)

APB 公司=EE×一个

()

解释:分子体现项目覆盖广度;平方根处理运动员数量,弱化规模效应
Explanation: The numerator reflects the square root of the breadth of the project coverage to deal with the number of athletes, weakening the scale effect

2.人才持续指数(Talent Continuity Index, TCI)
2. Talent Continuity Index (TCI).

TCI 大江生医=δ一个一个 {一个:国家i在t年运动员总数一个:前k届参赛且本届仍参赛的运动员数δ=0.9为衰减因子(经验值)

()

3.项目渗透率(Sport Penetration Rate, SPR)
3. Sport Penetration Rate (SPR).

战略风险=ΙS {S为当前奥运总项目数Ι为指示函数(国家i在项目s获得过奖牌=1)

() 重试    错误原因

4.综合人才基数指数(Composite Talent Pool Index, CTPI)
4. Composite Talent Pool Index (CTPI).

CTPITCI 大江生医战略风险

() 重试    错误原因

[α,β,γ]=主成分分析([APB、TCI、SPR])

() 重试    错误原因

5.动态调整机制 重试    错误原因

项目时代权重 重试    错误原因

w(t)=11+e {为项目s的引入年份新项目权重随时间递增(S型曲线)

() 重试    错误原因

6. 衰减因子自适应 重试    错误原因

δ=0.85+0.15∙tanh(TCI 大江生医最大(TCI 大江生医))

() 重试    错误原因

东道主效应动态量化公式 重试    错误原因

1.基础效应模型 重试    错误原因

HostBoost 主机提升=βHOstCurrent+βHostLegacy (托管遗产)+βEventLeverage 事件杠杆

() 重试    错误原因

2.变量定义与计算 重试    错误原因

当届东道主效应(HostCurrent) 重试    错误原因

HOstCurrent={1国家i在t年为主办国0其他

() 重试    错误原因

遗产效应(HostLegacy) 重试    错误原因

HostLegacy (托管遗产)=λHOstCurrent{λ=0.7(经验衰减系数)时间跨度为前3届(每届间隔4年)

() 重试    错误原因

项目杠杆效应(EventLeverage) 重试    错误原因

EventLeverage 事件杠杆=国家i当届新增项目奖牌数当届新增项目总奖牌数×1+国家i当届新增项目数)

() 重试    错误原因

3.动态系数估计 重试    错误原因

历史数据回归模型: 重试    错误原因

M埃达尔=α+βHOstCurrent+βHostLegacy+βEventLeverage 事件杠杆

() 重试    错误原因

奥运选手实力 重试    错误原因

1.运动员实力公式(Athlete Competitiveness Index, ACI)
1. Athlete Competitiveness Index (ACI).

国际机场=生涯奖牌数×1−|年龄−27|30×(1+参赛次数项目权重

() 重试    错误原因

年龄适配系数:27岁为奥运选手峰值年龄(基于历届冠军平均年龄) 重试    错误原因

2.项目战略权重(Sport Strategic Weight, SSW)
2. Sport Strategic Weight (SSW).

SSW=国家历史奖牌占比全球奖牌占比+0.3×1+近两届奖牌增长率项目平均增长率×1−年龄熵N 重试    错误原因

() 重试    错误原因

年龄熵:衡量项目年龄分布离散程度(低熵值表示年龄结构合理) 重试    错误原因

熵(年龄=−p(a)p(a)

传统优势:分母加0.3防止小样本波动
Traditional advantages: the denominator is added to 0.3 to prevent small sample fluctuations

发展势头:相对行业平均的增长加速度
Development momentum: relative to the average growth acceleration of the industry

梯队健康:年龄分布越集中(熵值低),得分越高
Echelon health: The more concentrated the age distribution (low entropy), the higher the score

3.国家综合实力(National Strength Index, NSI)
3. National Strength Index (NSI).

NSI=(国际机场1+项目竞争熵)+HostBoost 主机提升

() 重试    错误原因

项目竞争熵=−pp(p=国家c在项目s的奖牌占比)

项目竞争熵:反映该项目奖牌分布的集中程度(熵值越低,优势越明显)
Competitive entropy of the project: reflects the concentration of the medal distribution of the project (the lower the entropy value, the more obvious the advantage)

数量参赛的项目数量和种类
Number of entries and types of projects

1.参赛项目数量指标
1. Indicators of the number of entries in the project

绝对参赛数量(Total Participation, TP)
Total Participation (TP)

卫生纸=Ι(国家i在项目s有参赛者

() 重试    错误原因

意义:平衡项目覆盖与参赛规模,防止小国因参赛人数少被低估
Significance: Balance the coverage of the program with the scale of participation, and prevent small countries from being underestimated due to the small number of participants

相对参赛广度(Relative participation Wideth, RPB)
Relative participation Wideth (RPB)

RPB=卫生纸S×参赛运动员总数(S为当届总项目数

() 重试    错误原因

意义:平衡项目覆盖与参赛规模,防止小国因参赛人数少被低估 重试    错误原因

2.项目种类多样性指标 重试    错误原因

香农多样性指数(Shannon Diversity Index, SDI)
Shannon Diversity Index (SDI)

SDI=−pp(p=国家在项目s的参赛人数总参赛人数)

() 重试    错误原因

战略聚焦指数(Strategic Focus Index, SFI)
Strategic Focus Index (SFI)

SFI=S国家当届总参赛人数 {S=arg历史奖牌数 k值:通常取3(基于历史数据的最佳实践)

() 重试    错误原因