Project: Data Processing with Python
项目:使用Python进行数据处理
Goal: Choose ONE of the following data processing projects according to the requirements. Your work quality will be considered in the final marking. Show your code and screenshot with comments in the report and presentation.
目标:根据要求,从以下数据处理项目中选择一个. 您的工作质量将在最终评分中考虑。在报告和演示文稿中显示带有注释的代码和屏幕截图。
Options:
选项:
Drink consumption data analytics: Refer to “01drinks.csv”, finish the following tasks: (1) using pandas, read the csv file as DataFrame, output all rows, (2) when using “read_csv” function, pandas will recognize “NA” as “NaN”, try to solve this problem, (3) output the average beer/spirit/wine consumption for each continent, also plot some figure, (4) sort and output the highest consumption countries of beer/spirit/wine, also plot some figure, (5) retrieve another drinks dataset and redo the previous steps, (6) describe any interesting result you found from the dataset.
饮料消费数据分析:参考“01drinks.csv”,完成以下任务:(1)使用pandas,将csv文件读取为DataFrame,输出所有行,(2)使用“read_csv“函数时,pandas会将“NA”识别为“NaN“,尝试解决此问题,(3)输出各大洲的平均啤酒/烈酒/葡萄酒消费量,并绘制一些图,(4)排序并输出啤酒/烈酒/葡萄酒的最高消费国家,并绘制一些图,(5)检索另一个饮料数据集并重复前面的步骤,(6)描述您从数据集中发现的任何有趣结果。
Doge coin data analytics: Refer to “02doge.csv”, finish the following tasks: (1) using pandas, read the csv file as DataFrame, make sure the “Date” column should be date type, (2) do we have missing values in the DataFrame? If yes, try to fix the missing values, (3) output the highest and lowest values of the coin, including relate date, (4) output figures for the highest price every day for the coin, (5) using log-scale, redo the last step, (6) retrieve another encrypt-coin dataset and redo the previous steps, (7) describe any interesting result you found from the dataset.
Doge币数据分析:参考“02doge.csv”,完成以下任务:(1)使用pandas,将csv文件读取为DataFrame,确保“Date”列应为日期类型,(2)DataFrame中是否存在缺失值?如果是,尝试修复缺失的值,(3)输出硬币的最高值和最低值,包括相关日期,(4)输出硬币每天最高价格的数字,(5)使用对数刻度,重复最后一步,(6)检索另一个比特币数据集并重复上一步,(7)描述您从数据集中发现的任何有趣的结果。
Movie data analytics: Refer to “03movies.csv”, finish the following tasks: (1) using pandas, read the csv file as DataFrame, use data cleaning if necessary, (2) analyze and visualize movies by Genre, (3) analyze and visualize movies by Keywords, (4) analyze and visualize movies by Budget, (5) analyze and visualize movies by Language, (6) retrieve another movie dataset (size>5000) and redo the previous step, (7) describe any interesting result you found from the dataset.
电影数据分析:请参考“03movies.csv”,完成以下任务:(1)使用pandas,将csv文件读取为DataFrame,必要时使用数据清理,(2)按类型分析和可视化电影,(3)按关键字分析和可视化电影,(4)按预算分析和可视化电影,(5)按语言分析和可视化电影,(6)检索另一个电影数据集(大小为>5000)并重复上一步,(7)描述您发现的任何有趣的结果。从数据集。
COVID data analytics: Refer to “04covid.csv”, finish the following tasks: (1) using pandas, read the csv file as DataFrame, use data cleaning if necessary, (2) analyze and visualize daily all cases and all deaths, (3) analyze and visualize daily new cases and new deaths, (4) for each state, analyze and visualize all cases and all deaths, (5) analyze and visualize top and bottom 10 states with all cases and all deaths, (6) analyze and visualize all states with case percentage and death percentage per person, you should get the population data by yourself, (7) analyze and visualize all states with mortality rate, i.e., deaths/cases, (8) retrieve a global COVID dataset and redo the previous steps per country, (9) analyze the data which are related to China, (10) describe any interesting result you found from the dataset.
COVID数据分析:参考“04covid.csv”,完成以下任务:(1)使用pandas,将csv文件读取为DataFrame,必要时使用数据清理,(2)分析和可视化每日所有病例和所有死亡,(3)分析和可视化每日新增病例和新增死亡,(4)针对每个州,分析和可视化所有病例和所有死亡,(5)分析和可视化所有病例和所有死亡的前10个州和后10个州,(6)分析和可视化所有州的病例百分比和人均死亡百分比,您应该自己获取人口数据,(7)分析和可视化所有州的死亡率,即,死亡/病例,(8)检索全球COVID数据集并按国家重复先前的步骤,(9)分析与中国相关的数据,(10)描述您从数据集中发现的任何有趣的结果。
CO2 emissions data analytics: Refer to “05co2.csv”, finish the following tasks: (1) using pandas, read the csv file as DataFrame, use data cleaning if necessary, (2) analyze and visualize total emission and per capita emission of all countries, (3) analyze and visualize total emission and per capita emission of China for all years, (4) analyze and visualize total emission and per capita emission of U.S. for all years, (5) analyze and visualize the top emission and top per capita emission of each year, (6) analyze and visualize the total historical emissions of all countries, (7) retrieve the latest carbon emission data and redo the previous steps, (8) describe any interesting result you found from the dataset.
CO2排放数据分析:参考“05co2.csv”,完成以下任务:(1)使用pandas,将csv文件读取为DataFrame,必要时使用数据清洗,(2)分析和可视化所有国家的总排放量和人均排放量,(3)分析和可视化中国所有年份的总排放量和人均排放量,(4)分析和可视化美国的总排放量和人均排放量。 (5)分析和可视化每年的最高排放量和最高人均排放量,(6)分析和可视化所有国家的历史总排放量,(7)检索最新的碳排放数据并重复前面的步骤,(8)描述您从数据集中发现的任何有趣的结果。
Music album data analytics: Refer to “06albums.csv”, finish the following tasks: (1) using pandas, read the csv file as DataFrame, use data cleaning if necessary, (2) analyze and visualize album numbers by Genre, (3) analyze and visualize album sales by Genre, (4) analyze and visualize albums and tracks per year, (5) analyze and visualize top 5 best-selling genres for each year, (6) according to the different critics, which genre is more preferred for each year? analyze and visualize, (7) retrieve the latest album data and redo the previous steps, (8) describe any interesting result you found from the dataset.
音乐专辑数据分析:参考“06albums.csv”,完成以下任务:(1)使用pandas,将CSV文件读取为DataFrame,必要时使用数据清理,(2)按流派分析和可视化专辑编号,(3)按流派分析和可视化专辑销量,(4)分析和可视化每年的专辑和曲目,(5)分析和可视化每年最畅销的5种流派,(6)根据不同的评论家,每年更喜欢哪种流派? 分析和可视化,(7)检索最新的专辑数据并重复前面的步骤,(8)描述从数据集中找到的任何有趣的结果。
Earthquake data analytics: Refer to “07earthquake.csv”, finish the following tasks: (1) using pandas, read the csv file as DataFrame, use data cleaning if necessary, (2) convert the coordinate to country name, (3) for China, convert the coordinate to province name, (4) analyze and visualize earthquake by country for all years, (5) for China, analyze and visualize earthquake by province for all years, (6) analyze and visualize top 30 earthquake by magnitude for all countries, (7) analyze and visualize top 30 earthquake by magnitude for China, (8) retrieve the latest earthquake data and redo the previous steps, (9) describe any interesting result you found from the dataset.
地震数据分析:参见“07地震。csv”,完成以下任务:(1)使用pandas,将csv文件读取为DataFrame,必要时使用数据清理,(2)将坐标转换为国家名称,(3)对于中国,将坐标转换为省份名称,(4)分析和可视化所有年份的国家地震,(5)对于中国,分析和可视化所有年份的省份地震,(6)分析和可视化所有国家的30大地震,(7)分析和可视化中国的30大地震ina,(8)检索最新的地震数据并重复前面的步骤,(9)描述您从数据集中发现的任何有趣的结果。
E-Commerce data analytics: Refer to “08ecomm.csv”, finish the following tasks: (1) using pandas, read the csv file as DataFrame, use data cleaning if necessary, (2) analyze and visualize number of customers by country, (3) analyze and visualize total sales by country, (4) analyze and visualize best-selling products, (5) analyze and visualize sales per day, per week, and per month, (6) retrieve another e-commerce dataset and redo the previous steps, (7) describe any interesting result you found from the dataset.
电子商务数据分析:参考“08ecomm.csv”,完成以下任务:(1)使用pandas,将csv文件读取为DataFrame,必要时使用数据清理,(2)分析和可视化国家/地区的客户数量,(3)分析和可视化国家/地区的总销售额,(4)分析和可视化畅销产品,(5)分析和可视化每天,每周和每月的销售额,(6)检索另一个电子商务数据集并重复前面的步骤,(7)描述您从数据集中发现的任何有趣的结果。
Stock market data analytics: Refer to “09stock.zip” (SP500, NASDAQ100), finish the following tasks: (1) using pandas, read the csv file as DataFrame, use data cleaning if necessary, (2) analyze and visualize the prices of two stock indices, (3) analyze and visualize the trading volumes of two stock indices, (4) analyze and visualize the annual returns of two stock indices, (5) analyze and visualize the Sharpe ratio of two stock indices, (6) retrieve the SS300 & CHINEXT index, redo the previous steps and compare with the SP500 & NASDAQ100, (7) describe any interesting result you found from the dataset.
股票市场数据分析:请参考“09stock.zip”(SP500,NASDAQ 100),完成以下任务:(1)使用pandas,读取csv文件作为DataFrame,必要时使用数据清洗,(2)分析和可视化两个股票指数的价格,(3)分析和可视化两个股票指数的交易量,(4)分析和可视化两个股票指数的年收益率,(5)分析和可视化两个股票指数的夏普比率,(6)检索SS300&的指数,重复前面的步骤,并与SP500 NASDAQ 100进行比较&,(7)描述任何有趣的 您从数据集找到的结果。
Data science job analytics: Refer to “10dsjob.csv”, finish the following tasks: (1) using pandas, read the csv file as DataFrame, use data cleaning if necessary, (2) analyze and visualize the min/max salaries of all jobs, (3) analyze and visualize, which state has better salary? (4) analyze and visualize, which company has better reputation? (5) analyze and visualize, in consideration of both salary and company reputation, which job is best for you? (6) retrieve the job data from China and redo the previous steps, (7) describe any interesting result you found from the dataset.
数据科学职位分析:参考“10dsjob.csv”,完成以下任务:(1)使用pandas,将csv文件读取为DataFrame,必要时使用数据清洗,(2)分析并可视化所有职位的最低/最高工资,(3)分析并可视化,哪个州的工资更好?(4)分析和形象化,哪家公司的声誉更好?(5)分析和想象,考虑到工资和公司声誉,哪份工作最适合你?(6)从中国检索工作数据并重复前面的步骤,(7)描述您从数据集中发现的任何有趣的结果。