这是用户在 2024-11-28 17:05 为 https://app.immersivetranslate.com/word/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

DS 2043 Data Processing Workshop I
DS2043数据处理车间一

2024 Fall Project
2024秋季项目

The project work is in total 50% of your overall mark:
项目的工作是在总数的50%,你的总成绩

Project work (report with code and figure) 40%,
项目工作(报告中有代码和数字)-40%,

Project presentation (slides with presentation) – 10%
项目演示(幻灯片演示)- 10%

Project group
项目组

You will form a group and work together to finish the project. Therefore, cooperation is very crucial to the success of this project. However, you should clearly specify the role and contribution percentage of each member in the project report. The grading will consider the overall result of the teamwork, as well as the workload, output quality, and presentation of each member in a team.
你们将组成一个小组,共同完成这个项目。因此,合作对这个项目的成功至关重要。但是,您应该在项目报告中明确指定每个成员的角色和贡献百分比。评分将考虑团队合作的总体结果,以及工作量输出质量团队中每个成员的表现。

Project deliverable
项目可交付

Codes with proper comments.
有适当注释的代码。

A written report in PDF format with name “GROUP_NO_GROUP_NAME.pdf recording what you did, how you do it and what result you get in each step. Please include important code and screenshot with comments.
一份PDF格式的书面报告,名称为“GROUP_NO_GROUP_NAME。pdf记录你做了什么,你是如何做的,以及你在每一步中得到的结果。请包括重要的代码和截图与评论。

Prepare a presentation, clearly list the contribution percentage of each member.
准备一份演示文稿,清楚地列出每个成员的贡献百分比

Zip your code, report, and presentation as GROUP_NO_GROUP_NAME.zip and submit to iSpace.
将您的代码报告演示文稿压缩GROUP_NO_GROUP_NAME.zip并提交到iSpace。

Project: Data Processing with Python
项目:使用Python进行数据处理

Goal: Choose ONE of the following data processing projects according to the requirements. Your work quality will be considered in the final marking. Show your code and screenshot with comments in the report and presentation.
目标根据要求,从以下数据处理项目中选择一个. 您的工作质量将在最终评分中考虑。在报告和演示文稿中显示带有注释的代码和屏幕截图

Options:
选项

Drink consumption data analytics: Refer to “01drinks.csv”, finish the following tasks: (1) using pandas, read the csv file as DataFrame, output all rows, (2) when using “read_csv” function, pandas will recognize “NA” as “NaN”, try to solve this problem, (3) output the average beer/spirit/wine consumption for each continent, also plot some figure, (4) sort and output the highest consumption countries of beer/spirit/wine, also plot some figure, (5) retrieve another drinks dataset and redo the previous steps, (6) describe any interesting result you found from the dataset.
饮料消费数据分析:参考“01drinks.csv”,完成以下任务:(1)使用pandas,将csv文件读取为DataFrame,输出所有行,(2)使用“read_csv“函数时,pandas会将“NA”识别为“NaN“,尝试解决此问题,(3)输出各大洲的平均啤酒/烈酒/葡萄酒消费量,并绘制一些图,(4)排序并输出啤酒/烈酒/葡萄酒的最高消费国家,并绘制一些图,(5)检索另一个饮料数据并重复前面的步骤6描述您从数据集中发现的任何有趣结果

Doge coin data analytics: Refer to “02doge.csv”, finish the following tasks: (1) using pandas, read the csv file as DataFrame, make sure the “Date” column should be date type, (2) do we have missing values in the DataFrame? If yes, try to fix the missing values, (3) output the highest and lowest values of the coin, including relate date, (4) output figures for the highest price every day for the coin, (5) using log-scale, redo the last step, (6) retrieve another encrypt-coin dataset and redo the previous steps, (7) describe any interesting result you found from the dataset.
Doge币数据分析参考“02doge.csv”,完成以下任务:(1)使用pandas,将csv文件读取为DataFrame,确保“Date”列应为日期类型,(2)DataFrame中是否存在缺失值?如果是,尝试修复缺失的值,(3)输出硬币的最高值和最低值,包括相关日期,(4)输出硬币每天最高价格的数字,(5)使用对数刻度,重复最后一步,(6)检索另一个比特币数据并重复上一步,(7)描述您从数据集中发现的任何有趣的结果

Movie data analytics: Refer to “03movies.csv”, finish the following tasks: (1) using pandas, read the csv file as DataFrame, use data cleaning if necessary, (2) analyze and visualize movies by Genre, (3) analyze and visualize movies by Keywords, (4) analyze and visualize movies by Budget, (5) analyze and visualize movies by Language, (6) retrieve another movie dataset (size>5000) and redo the previous step, (7) describe any interesting result you found from the dataset.
电影数据分析:请参考“03movies.csv”,完成以下任务:(1)使用pandas,将csv文件读取为DataFrame,必要时使用数据清理,(2)按类型分析和可视化电影,(3)按关键字分析和可视化电影,(4)按预算分析和可视化电影,(5)按语言分析和可视化电影,(6)检索另一个电影数据集(大小为>5000)并重复上一步,(7)描述您发现的任何有趣的结果。从数据集

COVID data analytics: Refer to “04covid.csv”, finish the following tasks: (1) using pandas, read the csv file as DataFrame, use data cleaning if necessary, (2) analyze and visualize daily all cases and all deaths, (3) analyze and visualize daily new cases and new deaths, (4) for each state, analyze and visualize all cases and all deaths, (5) analyze and visualize top and bottom 10 states with all cases and all deaths, (6) analyze and visualize all states with case percentage and death percentage per person, you should get the population data by yourself, (7) analyze and visualize all states with mortality rate, i.e., deaths/cases, (8) retrieve a global COVID dataset and redo the previous steps per country, (9) analyze the data which are related to China, (10) describe any interesting result you found from the dataset.
COVID数据分析:参考“04covid.csv”,完成以下任务:(1)使用pandas,将csv文件读取为DataFrame,必要时使用数据清理,(2)分析和可视化每日所有病例和所有死亡,(3)分析和可视化每日新增病例和新增死亡,(4)针对每个州,分析和可视化所有病例和所有死亡,(5)分析和可视化所有病例和所有死亡的前10个州和后10个州,(6)分析和可视化所有州的病例百分比和人均死亡百分比,您应该自己获取人口数据,(7)分析和可视化所有州的死亡率,即,死亡/病例,8)检索全球COVID数据集并按国家重复先前的步骤,(9)分析与中国相关的数据,(10)描述您从数据集中发现的任何有趣的结果

CO2 emissions data analytics: Refer to “05co2.csv”, finish the following tasks: (1) using pandas, read the csv file as DataFrame, use data cleaning if necessary, (2) analyze and visualize total emission and per capita emission of all countries, (3) analyze and visualize total emission and per capita emission of China for all years, (4) analyze and visualize total emission and per capita emission of U.S. for all years, (5) analyze and visualize the top emission and top per capita emission of each year, (6) analyze and visualize the total historical emissions of all countries, (7) retrieve the latest carbon emission data and redo the previous steps, (8) describe any interesting result you found from the dataset.
CO2排放数据分析参考“05co2.csv”,完成以下任务:(1)使用pandas,将csv文件读取为DataFrame,必要时使用数据清洗(2)分析和可视化所有国家的总排放量和人均排放量,(3)分析和可视化中国所有年份的总排放量和人均排放量,(4)分析和可视化美国的总排放量和人均排放量。 (5)分析和可视化每年的最高排放量和最高人均排放量,(6)分析和可视化所有国家的历史总排放量,7)检索最新的碳排放数据并重复前面的步骤,(8)描述您从数据集中发现的任何有趣的结果

Music album data analytics: Refer to “06albums.csv”, finish the following tasks: (1) using pandas, read the csv file as DataFrame, use data cleaning if necessary, (2) analyze and visualize album numbers by Genre, (3) analyze and visualize album sales by Genre, (4) analyze and visualize albums and tracks per year, (5) analyze and visualize top 5 best-selling genres for each year, (6) according to the different critics, which genre is more preferred for each year? analyze and visualize, (7) retrieve the latest album data and redo the previous steps, (8) describe any interesting result you found from the dataset.
音乐专辑数据分析:参考“06albums.csv”,完成以下任务:1)使用pandas,将CSV文件读取为DataFrame,必要时使用数据清理,(2)按流派分析和可视化专辑编号,(3)按流派分析和可视化专辑销量,(4)分析和可视化每年的专辑和曲目,(5)分析和可视化每年最畅销的5流派(6)根据不同的评论家,每年更喜欢哪种流派? 分析和可视化7检索最新的专辑数据并重复前面的步骤,(8)描述从数据集中找到的任何有趣的结果

Earthquake data analytics: Refer to “07earthquake.csv”, finish the following tasks: (1) using pandas, read the csv file as DataFrame, use data cleaning if necessary, (2) convert the coordinate to country name, (3) for China, convert the coordinate to province name, (4) analyze and visualize earthquake by country for all years, (5) for China, analyze and visualize earthquake by province for all years, (6) analyze and visualize top 30 earthquake by magnitude for all countries, (7) analyze and visualize top 30 earthquake by magnitude for China, (8) retrieve the latest earthquake data and redo the previous steps, (9) describe any interesting result you found from the dataset.
地震数据分析参见“07地震。csv”,完成以下任务:(1)使用pandas,将csv文件读取为DataFrame,必要时使用数据清理,2将坐标转换为国家名称3对于中国,将坐标转换为省份名称4)分析和可视化所有年份的国家地震(5)对于中国,分析和可视化所有年份的省份地震,6)分析和可视化所有国家的30大地震,(7)分析和可视化中国的30大地震ina,8)检索最新的地震数据并重复前面的步骤,(9)描述您从数据集中发现的任何有趣的结果

E-Commerce data analytics: Refer to “08ecomm.csv”, finish the following tasks: (1) using pandas, read the csv file as DataFrame, use data cleaning if necessary, (2) analyze and visualize number of customers by country, (3) analyze and visualize total sales by country, (4) analyze and visualize best-selling products, (5) analyze and visualize sales per day, per week, and per month, (6) retrieve another e-commerce dataset and redo the previous steps, (7) describe any interesting result you found from the dataset.
电子商务数据分析:参考“08ecomm.csv”,完成以下任务:(1)使用pandas,将csv文件读取为DataFrame,必要时使用数据清理,(2)分析和可视化国家/地区的客户数量,(3)分析和可视化国家/地区的总销售额,(4)分析和可视化畅销产品,(5)分析和可视化每天,每周和每月的销售额,6)检索另一个电子商务数据并重复前面的步骤7)描述您从数据集中发现的任何有趣的结果

Stock market data analytics: Refer to “09stock.zip” (SP500, NASDAQ100), finish the following tasks: (1) using pandas, read the csv file as DataFrame, use data cleaning if necessary, (2) analyze and visualize the prices of two stock indices, (3) analyze and visualize the trading volumes of two stock indices, (4) analyze and visualize the annual returns of two stock indices, (5) analyze and visualize the Sharpe ratio of two stock indices, (6) retrieve the SS300 & CHINEXT index, redo the previous steps and compare with the SP500 & NASDAQ100, (7) describe any interesting result you found from the dataset.
股票市场数据分析:请参考“09stock.zip”(SP500,NASDAQ 100),完成以下任务:(1)使用pandas,读取csv文件作为DataFrame,必要时使用数据清洗,(2)分析和可视化两个股票指数的价格,(3)分析和可视化两个股票指数的交易量,(4)分析和可视化两个股票指数的年收益率,(5)分析和可视化两个股票指数的夏普比率6)检索SS300&的指数,重复前面的步骤并与SP500 NASDAQ 100进行比较&,(7)描述任何有趣的 您从数据集找到的结果

Data science job analytics: Refer to “10dsjob.csv”, finish the following tasks: (1) using pandas, read the csv file as DataFrame, use data cleaning if necessary, (2) analyze and visualize the min/max salaries of all jobs, (3) analyze and visualize, which state has better salary? (4) analyze and visualize, which company has better reputation? (5) analyze and visualize, in consideration of both salary and company reputation, which job is best for you? (6) retrieve the job data from China and redo the previous steps, (7) describe any interesting result you found from the dataset.
数据科学职位分析参考“10dsjob.csv”,完成以下任务:(1)使用pandas,将csv文件读取为DataFrame,必要时使用数据清洗(2)分析并可视化所有职位的最低/最高工资,(3)分析并可视化,哪个州的工资更好?(4)分析和形象化,哪家公司的声誉更好?(5)分析和想象,考虑到工资和公司声誉,哪份工作最适合你?(6)从中国检索工作数据重复前面的步骤,(7)描述您从数据集中发现的任何有趣的结果

image icon