这是用户在 2024-5-11 13:02 为 https://2966ba003ba54a4f9db178ad7b40b88e.app.posit.cloud/file_show?path=%2Fcloud%2Fproject%2Fmanual%... 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Preamble 序言

In our previous sessions, we used linear models to explore the relationship between multiple predictors X1, X2,and a continuous response Y. However, the assumptions of the linear model are not always well-suited to the data that we are interested in. For example, if our response Y is a count variable (e.g., number of flies per cow, number of seeds in a 1-m2 plot), the mean μ should be strictly non-negative and integer. If our response Y is a binary variable (success/failure, true/false, present/absent, dead/alive, which we often code as 0/1), the mean μ represents the probability of success and μ must, therefore, be constrained between 0 and 1.
在之前的会议中,我们使用线性模型来探索多个预测变量 X1 之间的关系, X2 以及连续响应 Y 。然而,线性模型的假设并不总是适合我们感兴趣的数据。例如,如果我们的响应 Y 是一个计数变量(例如,每头奶牛的苍蝇数量,1 m 2 图中的种子数量),则均值 μ 应严格为非负数和整数。如果我们的响应 Y 是一个二元变量(成功/失败、真/假、存在/不存在、死/活,我们通常将其编码为 0/1),则均值 μ 表示成功的概率, μ 因此必须约束在 0 和 1 之间。

In addition, the distribution of observations Y around μ does not always follow a Normal distribution. For example, count data are strictly discrete (i.e., Y can be only have integer values 0, 1, 2, 3), whereas binary data can only have two options (0 or 1 value). When the response data is discrete or binary, we need to use discrete or binary distributions to model them in our analyses.
此外, Y 周围 μ 观测值的分布并不总是遵循正态分布。例如,计数数据是严格离散的(即 Y 只能有整数值 0、1、2、3),而二进制数据只能有两个选项(0 或 1 个值)。当响应数据是离散或二进制时,我们需要在分析中使用离散或二元分布来对它们进行建模。

In the linear model that we have used so far, the mean μ can range from negative infinity to positive infinity and assumes the residuals follow a normal distribution. However, when the response data do not map onto that range or cannot produce normally distributed residuals, this can cause problems. Generalized Linear Models (GLMs) generalize the linear model to deal with these cases.
在我们目前使用的线性模型中,均值 μ 的范围可以从负无穷大到正无穷大,并假设残差服从正态分布。但是,当响应数据未映射到该范围或无法生成正态分布残差时,这可能会导致问题。广义线性模型 (GLM) 推广线性模型以处理这些情况。

In this session we will introduce you to the use of GLMs using count data and binary data. Upon completion of this session, you should be able to:
在本次会议中,我们将向您介绍如何使用计数数据和二进制数据来使用 GLM。完成此会话后,您应该能够:

  1. generate and interpret linear models for count and binary data.
    生成和解释计数和二进制数据的线性模型。

  2. understand how to interpret the models and relate that to their biological meaning.
    了解如何解释模型并将其与它们的生物学意义联系起来。


Assessment 评估

This prac will be assessed. There are 5 questions worth 12 points at the end of the prac. You will need to provide answers for each of the questions. Your answers may require written text, R code and R output, R graphics, or some combination of these.
将对此进行评估。练习结束时有 5 个问题,每 12 分。您需要为每个问题提供答案。你的答案可能需要书面文本、R 代码和 R 输出、R 图形或这些的某种组合。


Revisiting the linear model
重新审视线性模型

By now you should all be familiar with the equation for a linear model:
到现在为止,你们应该都熟悉线性模型的方程:

Yβ0+β1X+ϵϵN(0,σ2)

To better understand the link between the linear model and the generalized linear model, it is useful to re-write the linear model shown above using the following set of three equations:
为了更好地理解线性模型和广义线性模型之间的联系,使用以下三个方程组重写上面显示的线性模型是很有用的:

  1. YN(μ,σ2)

This states that the observations Y follow a Normal distribution with mean μ and variance σ2.
这说明观测值 Y 遵循具有均值 μ σ2 和方差的正态分布。


  1. μ=f(lp)

The mean (μ) is a function of an intermediate quantity called the linear predictor, lp. In the specific case of the linear model, the function f is the identity function; that is, μ=lp.
均值 ( μ ) 是称为线性预测变量的中间量的函数。 lp 在线性模型的具体情况下,函数 f 是恒等函数;也就是说, μ=lp .


  1. lp=β0+β1X1

The linear predictor, lp, is the linear combination of the predictors represented as the equation of the line that we are familiar with.
线性预测变量 lp 是预测变量的线性组合,表示为我们熟悉的直线方程。


Thus the mean of Y, μ, equals the linear predictor lp, which equals β0+β1X1. Or, alternatively:
因此 Y ,的 μ 均值等于线性预测变量,线性预测变量 lp 等于 β0+β1X1 。或者,或者:

YN(β0+β1X1,σ2)


Generalising the linear model: Generalized Linear Model
推广线性模型:广义线性模型

The Generalized Linear Model (GLM) generalises the linear model to accommodate different types of response variables. While we keep the linear combination of predictors (the third of the three equations above), we allow for different stochastic distributions of observations Y around the mean μ, and we allow for different functions linking μ to the lp:
广义线性模型 (GLM) 对线性模型进行推广,以适应不同类型的响应变量。虽然我们保留了预测变量的线性组合(上述三个方程中的第三个),但我们允许 Y 观测值在均值 μ 附近进行不同的随机分布,并且我们允许链接 μlp

  1. YDistribution(μ,...)

This says that the observations Y follow a probability distribution with mean μ. There are a range of common distributions for GLMs, including Poisson and Binomial distributions, which we discuss below.
这表示观测值 Y 遵循均值 μ 的概率分布。GLM 有一系列常见的分布,包括 Poisson 分布和二项分布,我们将在下面讨论。

  1. μ=f(lp)

There is a function linking μ to lp. This maps the