GGES6021 Coursework
GGES6021课程作业
Basic Statistical Skills in R
R 中 的基本 S统计技能
Deadline: Thursday 21st November at 16:00
截止日期:T星期四 2 月 1日 11 月 16:00
Submit through eAssignment as a Word/PDF document
通过 eAssignment 以 Word/PDF 文档形式提交
For your assignment, you will be investigating the factors influencing species richness of anurans (frogs and toads). The data we will be using are from work carried out by Felix Eigenbrod during his PhD. The .csv file containing the data includes site ID (site), species richness for each site (species_richness), the total amount of forest edge in metres (total_edge_m), percentage forest cover (percent_forest), the density of roads (road_density) and the density of traffic (traffic_density). Site ID and species richness are measured at the site level; all other variables are measured within a 500m buffer around each site.
在你的作业中,你将调查影响 anurans(青蛙和蟾蜍)物种丰富度的因素。我们将使用的数据来自 Felix Eigenbrod 在博士期间进行的工作。包含数据的.csv文件包括地点 ID(地点)、每个地点的物种丰富度 (species_richness)、森林边缘总量(以米为单位)(total_edge_m)、森林覆盖率 (percent_forest)、道路密度 (road_density) 和交通密度 (traffic_density).Site ID 和物种丰富度是在 Site 级别测量的;所有其他变量均在每个站点周围 500m 缓冲区内测量。
Fig. 1 - Study area, showing all sites (n = 36) surrounded by a buffer 1500 m in radius.
图 1 - 研究区域,显示所有地点 (n = 36) 被半径 1500 m 的缓冲区包围。
You will be answering the question: What are the main landscape-scale factors driving species richness in anuran species in Ottowa?
您将回答以下问题:推动奥托瓦 anuran 物种物种丰富性的主要景观尺度因素是什么?
Overleaf are the four main tasks to complete in order to demonstrate your understanding of the fundamentals of summative statistics and regression. This assignment must be completed in the R software. You need to give your methodology at every stage. Feel free to copy and paste pieces of your R analysis into this document. R code is not included in the word count.
Overleaf是要完成的四个主要任务,以证明您对 汇总统计和回归基础知识的理解。此作业必须在 R 软件中完成。你需要在每个阶段给出你的方法。 请随意将 R 分析的各个部分复制并粘贴到此文档中。 R code 不包括在字数中。
Note that each major question carries points (e.g., 20% = 20 points) showing its value in the overall assignment. See GGES6021 Blackboard > Assignments > Marking Criteria > MSc (Level 7) Generic Marking Criteria Sheet
请注意,每个主要问题都有分数(例如,20% = 20 分),显示了它在整个作业中的价值。请参阅 GGES6021 Blackboard >作业 > 评分标准 > MSc (7 级) 通用评分标准表.
Data for this exercise
本练习的数据
The data for this exercise are found on the Blackboard site, in Assignments > Assignment 2: R Statistics Coursework. The file containing the data is called ‘anuran_species_richness.csv
此练习的数据位于 Blackboard 站点的 Assignments> 作业 2: R Statistics Coursework 中。包含数据的文件称为 'anuran_species_richness.csv’.
In order to answer the question, you will need to perform a multiple linear regression and all associated exploratory and model validation tasks. The practical from week 5 will provide a guide to doing this. Remember to also use the links given (for R and Stats resources) on Blackboard, as well as your own searches for additional guidance: one of the aims this task is to allow you to practice and become more confident in self-teaching yourself new skills and software
为了回答这个问题,您需要执行多元线性回归以及所有相关的探索性和模型验证任务。第 5 周的实践将提供执行此操作的指南。请记住,还要使用 Blackboard 上提供的链接(用于 R 和 Stats 资源),以及您自己的搜索以获取其他指导,此任务的目标之一是让您练习并更有信心自学新技能和软件.
Ensure that the below 4 tasks are completed within this work. You should include the results, relevant figures and/or tables which you have prepared and address the questions posed in the tasks below in a written report which you will submit. You will also need to write a brief introduction and conclusion supporting your analysis (5%, maximum 400 words) which sets the context, the research aim and summarises the key findings. You should use the task names below as headings to identify each section clearly.
确保在 这项工作中完成以下 4 项任务。 您应该将您准备的结果、相关数字和/或表格包含在 您将提交的 书面报告中,并解决以下任务中提出的问题。 你还需要写一个简短的引言和结论来支持你的分析(5%,最多 400 字),其中设定背景、研究目标并总结主要发现。 您应该 使用下面的任务名称作为标题,以清楚地识别每个部分。
Tasks
任务
Summative statistics (10%, maximum 250 words)
汇总统计(10%,最多 250 字).
Produce a table giving the summative statistics (measures of central tendency and dispersion) for each variable. Illustrate these with plots showing the distributions of the variables.
P生成一个表格,给出每个变量的汇总统计量(集中趋势和离散度的测量)。 用显示变量分布的图来说明这些。
Bivariate analysis (20%, maximum 350 words)
双变量分析(20%,最多 350 字).
Perform a correlation analysis between all the variables. You need to choose a method for the correlation analysis: Pearson, Kendall or Spearman, and justify it. Illustrate the analysis with appropriate plots. Which are the most correlated variables? Which variables will you use to model species richness? And why? Will you need to transform any of your variables?
在所有 变量之间执行相关性分析。 您需要选择一种方法进行相关性分析:Pearson、Kendall 或 Spearman,并证明其合理性。 用适当的图来说明分析。 哪些是最相关的变量?您将使用哪些变量 来模拟物种丰富度?为什么呢? 您需要转换任何变量吗?
Multivariate regression (35%, maximum 600 words)
M最终变量回归(35%,最多 600 字).
Perform multivariate linear regression using model selection with species richness as your dependent variable. Explain how you arrived at your final model. Report the results of the regression analysis ensuring you discuss the meaning of the coefficients, their uncertainty and significance, and the general explanatory power of the model.
使用模型选择执行多元线性回归,并将物种丰富度作为因变量。解释一下您是如何得出最终模型的。报告回归分析的结果,确保您讨论系数的含义、它们的不确定性和显著性以及模型的一般解释能力。
Regression diagnostics (30%, maximum 400 words).
回归诊断(30%,最多 400 个单词)。
Perform the diagnostics for the remaining assumptions on the residuals of the final model.
对最终模型的残差的其余假设执行诊断。