This is a bilingual snapshot page saved by the user at 2025-1-26 17:31 for https://app.immersivetranslate.com/word/, provided with bilingual support by Immersive Translate. Learn how to save?

3.Models


3.1 Data preprocessing


Data Filling: After analyzing the given data, it is found that it is mainly summerOly_programs.csv


There are missing values and outliers in the data. To solve this problem, we use the following approach:


For missing values, the number of missing values is small, so the method of filling in 0 is used.


For outliers, we found that there were only a small fraction of outliers compared to the large amount of data provided, and that the indicator had a strong correlation with the year indicator. Therefore, the outliers are modified by the mode interpolation method.


The cleaned data will be further integrated to extract the number of gold medals and medals of each country in the past Olympic Games from the summerOly_medal_counts.csv, the number of athletes from each country from the summerOly_athletes.csv, and the identification of its dominant events in the Olympic Games. Figure 1 shows the boxplot of the data after simple processing:


Figure A: Boxplot plotted for a given dataset


In the process of integration, it was found that the representation of the country in the two tables was slightly different, and the country code and the full name of the country were used for representation respectively, and a series of problems may arise if the match was made directly. As a result, a cross-reference table of country names and area codes [1] was consulted online and processed. Ensure data consistency to solve problems with predictive models.


At the same time, due to some political factors, some countries have undergone mergers or disintegrations, and their names are not the same in the history of the Olympic Games. For example, the Soviet Union was dissolved into independent states in 1991, and during the Soviet era, the Olympic Games were played under the name of the "Soviet Union", and after the dissolution, each independent country competed in its own name. When unifying the name, it is necessary to clarify the names of the countries corresponding to different time periods, and the data during the Soviet period are classified as "Soviet Union", and then they are classified separately into countries such as Russia and Ukraine. Germany has historically been divided into East and West Germany, which competed in the Olympics in teams and under the name "Germany" after reunification in 1990. When dealing with it, it is necessary to accurately distinguish between different historical stages.


And in some years, some countries sent multiple teams to work on the same project, which is something that needs to be considered when it comes to data processing.


3.2 Problem Analysis Model


3.2.1 Model introduction


(1) Multiple regression prediction model


In order to solve the medal prediction problem, our group finally decided to use the multiple regression prediction model. The multiple regression prediction model is a complex model in regression analysis, which considers the influence of multiple input variables on the output variables, and models the system relationship more comprehensively by introducing multiple factors.


Let's take a closer look at the principles of multiple linear regression, including model building and parameter estimation methods. Compared with univariate linear regression, multiple linear regression has higher predictive and explanatory power, and can more accurately capture the complex influence of individual factors on the output.


Model expressions


Let the dependent variable be Y, and the k independent variables affecting the dependent variable are X, X, X... X, assuming that the effect of each independent variable on the dependent variable Y is linear, that is, the mean of Y varies with the independent variable while the other independent variables remain constant

[1]https://www.guojiadaima.com