Sensitivity and specificity
敏感性和特异性

From Wikipedia, the free encyclopedia
来自维基百科,自由的百科全书
Sensitivity and specificity - The left half of the image with the solid dots represents individuals who have the condition, while the right half of the image with the hollow dots represents individuals who do not have the condition. The circle represents all individuals who tested positive.
敏感性和特异性 - 图像的左半部分是实心点,代表患有该病的个体,而图像的右半部分是空心点,代表没有患有该病的个体。圆圈代表所有测试结果为阳性的个体。

In medicine and statistics, sensitivity and specificity mathematically describe the accuracy of a test that reports the presence or absence of a medical condition. If individuals who have the condition are considered "positive" and those who do not are considered "negative", then sensitivity is a measure of how well a test can identify true positives and specificity is a measure of how well a test can identify true negatives:
在医学和统计学中,敏感性和特异性在数学上描述了一种报告医学状况存在与否的测试的准确性。如果将患有该状况的个体视为“阳性”,将没有该状况的个体视为“阴性”,那么敏感性是衡量测试能够正确识别真阳性的能力,而特异性是衡量测试能够正确识别真阴性的能力。

  • Sensitivity (true positive rate) is the probability of a positive test result, conditioned on the individual truly being positive.
    敏感性(真阳性率)是指在个体真实为阳性的条件下,测试结果为阳性的概率。
  • Specificity (true negative rate) is the probability of a negative test result, conditioned on the individual truly being negative.
    特异性(真阴性率)是指在个体真实为阴性的条件下,阴性测试结果的概率。

If the true status of the condition cannot be known, sensitivity and specificity can be defined relative to a "gold standard test" which is assumed correct. For all testing, both diagnoses and screening, there is usually a trade-off between sensitivity and specificity, such that higher sensitivities will mean lower specificities and vice versa.
如果无法确定病情的真实状态,敏感性和特异性可以相对于被认为是正确的“黄金标准测试”进行定义。对于所有的测试,无论是诊断还是筛查,通常都存在敏感性和特异性之间的权衡,即更高的敏感性意味着更低的特异性,反之亦然。

A test which reliably detects the presence of a condition, resulting in a high number of true positives and low number of false negatives, will have a high sensitivity. This is especially important when the consequence of failing to treat the condition is serious and/or the treatment is very effective and has minimal side effects.
一个可靠地检测出某种状况存在的测试,能够产生大量真正阳性结果和较少的假阴性结果,将具有较高的敏感性。当未能治疗该状况的后果严重和/或治疗方法非常有效且副作用最小时,这一点尤为重要。

A test which reliably excludes individuals who do not have the condition, resulting in a high number of true negatives and low number of false positives, will have a high specificity. This is especially important when people who are identified as having a condition may be subjected to more testing, expense, stigma, anxiety, etc.
一个可靠地排除没有该病症的个体的测试,导致真阴性结果较多,假阳性结果较少,将具有较高的特异性。当被诊断为患有该病症的人可能会接受更多的测试、费用、污名、焦虑等时,这一点尤为重要。

Sensitivity and specificity
敏感性和特异性

The terms "sensitivity" and "specificity" were introduced by American biostatistician Jacob Yerushalmy in 1947.[1]
术语“敏感性”和“特异性”是由美国生物统计学家雅各布·耶鲁沙尔米于 1947 年引入的。

There are different definitions within laboratory quality control, wherein "analytical sensitivity" is defined as the smallest amount of substance in a sample that can accurately be measured by an assay (synonymously to detection limit), and "analytical specificity" is defined as the ability of an assay to measure one particular organism or substance, rather than others.[2] However, this article deals with diagnostic sensitivity and specificity as defined at top.
在实验室质量控制中有不同的定义,其中“分析灵敏度”被定义为一种测定方法能够准确测量样本中最小物质量(与检测限度同义),而“分析特异性”被定义为一种测定方法能够测量特定的生物体或物质,而不是其他的。然而,本文讨论的是顶层定义下的诊断灵敏度和特异性。

Application to screening study
申请进行筛选研究
[edit]

Imagine a study evaluating a test that screens people for a disease. Each person taking the test either has or does not have the disease. The test outcome can be positive (classifying the person as having the disease) or negative (classifying the person as not having the disease). The test results for each subject may or may not match the subject's actual status. In that setting:
想象一项评估筛查人们是否患病的测试的研究。每个参与测试的人要么患有疾病,要么没有。测试结果可以是阳性(将人分类为患病)或阴性(将人分类为未患病)。每个受试者的测试结果可能与其实际状态相符,也可能不相符。在这种情况下:

  • True positive: Sick people correctly identified as sick
    真阳性:将病人正确识别为患病者
  • False positive: Healthy people incorrectly identified as sick
    假阳性:将健康的人错误地识别为患病者
  • True negative: Healthy people correctly identified as healthy
    真阴性:将健康的人正确地识别为健康
  • False negative: Sick people incorrectly identified as healthy
    假阴性:将病人错误地识别为健康

After getting the numbers of true positives, false positives, true negatives, and false negatives, the sensitivity and specificity for the test can be calculated. If it turns out that the sensitivity is high then any person who has the disease is likely to be classified as positive by the test. On the other hand, if the specificity is high, any person who does not have the disease is likely to be classified as negative by the test. An NIH web site has a discussion of how these ratios are calculated.[3]
在获得真阳性、假阳性、真阴性和假阴性的数字后,可以计算出该测试的敏感性和特异性。如果敏感性高,则任何患有该疾病的人都有可能被该测试分类为阳性。另一方面,如果特异性高,则任何没有患病的人都有可能被该测试分类为阴性。美国国立卫生研究院的一个网站上有关于如何计算这些比率的讨论。

Definition 定义[edit]

Sensitivity 敏感度[edit]

Consider the example of a medical test for diagnosing a condition. Sensitivity (sometimes also named the detection rate in a clinical setting) refers to the test's ability to correctly detect ill patients out of those who do have the condition.[4] Mathematically, this can be expressed as:
考虑一个用于诊断疾病的医学测试的例子。敏感性(有时在临床环境中也称为检出率)指的是测试正确地检测出患有该疾病的病人的能力。数学上可以表示为:

A negative result in a test with high sensitivity can be useful for "ruling out" disease,[4] since it rarely misdiagnoses those who do have the disease. A test with 100% sensitivity will recognize all patients with the disease by testing positive. In this case, a negative test result would definitively rule out the presence of the disease in a patient. However, a positive result in a test with high sensitivity is not necessarily useful for "ruling in" disease. Suppose a 'bogus' test kit is designed to always give a positive reading. When used on diseased patients, all patients test positive, giving the test 100% sensitivity. However, sensitivity does not take into account false positives. The bogus test also returns positive on all healthy patients, giving it a false positive rate of 100%, rendering it useless for detecting or "ruling in" the disease.[citation needed]
高灵敏度的测试结果为阴性时,对于“排除”疾病是有用的,因为它很少误诊那些确实患有该疾病的人。100%灵敏度的测试将通过阳性结果识别所有患有该疾病的患者。在这种情况下,阴性的测试结果将明确排除患者体内存在该疾病。然而,高灵敏度的测试结果为阳性并不一定有助于“确诊”疾病。假设一个“伪造”的测试工具被设计成始终显示阳性结果。当用于患病患者时,所有患者都会测试为阳性,使得该测试的灵敏度为 100%。然而,灵敏度并未考虑到假阳性。这个伪造的测试工具也会对所有健康患者返回阳性结果,使其假阳性率达到 100%,从而无法用于检测或“确诊”该疾病。

The calculation of sensitivity does not take into account indeterminate test results. If a test cannot be repeated, indeterminate samples either should be excluded from the analysis (the number of exclusions should be stated when quoting sensitivity) or can be treated as false negatives (which gives the worst-case value for sensitivity and may therefore underestimate it).[citation needed]
敏感性的计算不考虑不确定的测试结果。如果无法重复测试,不确定的样本可以被排除在分析之外(在引用敏感性时应说明排除的数量),或者可以将其视为假阴性(这会给出敏感性的最坏情况值,因此可能低估敏感性)。

A test with a higher sensitivity has a lower type II error rate.
具有更高灵敏度的测试具有较低的二类错误率。

Specificity 特异性[edit]

Consider the example of a medical test for diagnosing a disease. Specificity refers to the test's ability to correctly reject healthy patients without a condition. Mathematically, this can be written as:
考虑一个用于诊断疾病的医学测试的例子。特异性指的是测试正确地排除没有疾病的健康患者的能力。数学上可以表示为:

A positive result in a test with high specificity can be useful for "ruling in" disease, since the test rarely gives positive results in healthy patients.[5] A test with 100% specificity will recognize all patients without the disease by testing negative, so a positive test result would definitively rule in the presence of the disease. However, a negative result from a test with high specificity is not necessarily useful for "ruling out" disease. For example, a test that always returns a negative test result will have a specificity of 100% because specificity does not consider false negatives. A test like that would return negative for patients with the disease, making it useless for "ruling out" the disease.
高特异性的测试结果为阳性时,对于“确定”疾病的存在是有用的,因为该测试在健康患者中很少出现阳性结果。具有 100%特异性的测试将通过阴性测试识别出所有没有该疾病的患者,因此阳性测试结果将明确确定疾病的存在。然而,高特异性测试的阴性结果不一定对于“排除”疾病有用。例如,总是返回阴性测试结果的测试的特异性为 100%,因为特异性不考虑假阴性。这样的测试将对患有该疾病的患者返回阴性结果,使其对于“排除”该疾病无用。

A test with a higher specificity has a lower type I error rate.
具有更高特异性的测试具有较低的 I 型错误率。

Graphical illustration 图形说明[edit]

The above graphical illustration is meant to show the relationship between sensitivity and specificity. The black, dotted line in the center of the graph is where the sensitivity and specificity are the same. As one moves to the left of the black dotted line, the sensitivity increases, reaching its maximum value of 100% at line A, and the specificity decreases. The sensitivity at line A is 100% because at that point there are zero false negatives, meaning that all the negative test results are true negatives. When moving to the right, the opposite applies, the specificity increases until it reaches the B line and becomes 100% and the sensitivity decreases. The specificity at line B is 100% because the number of false positives is zero at that line, meaning all the positive test results are true positives.
上图旨在展示敏感性和特异性之间的关系。图中黑色虚线是敏感性和特异性相等的位置。当向黑色虚线左移时,敏感性增加,特异性减少,直到到达线 A,敏感性达到最大值 100%。线 A 处敏感性为 100%,因为此时不存在假阴性,也就是所有阴性检测结果都是真阴性。向右移动时,情况相反,特异性增加,直到到达线 B 时,特异性为 100%,敏感性降低。线 B 处特异性为 100%,因为此时不存在假阳性,也就是所有阳性检测结果都是真阳性。

The middle solid line in both figures that show the level of sensitivity and specificity is the test cutoff point. As previously described, moving this line results in a trade-off between the level of sensitivity and specificity. The left-hand side of this line contains the data points that tests below the cut off point and are considered negative (the blue dots indicate the False Negatives (FN), the white dots True Negatives (TN)). The right-hand side of the line shows the data points that tests above the cut off point and are considered positive (red dots indicate False Positives (FP)). Each side contains 40 data points.
两个图中显示敏感性和特异性水平的中间实线是测试的截断点。如前所述,移动这条线会在敏感性和特异性水平之间进行权衡。这条线的左侧包含测试结果低于截断点并被认为是阴性的数据点(蓝色点表示假阴性(FN),白色点表示真阴性(TN))。线的右侧显示测试结果高于截断点并被认为是阳性的数据点(红色点表示假阳性(FP))。每一侧都包含 40 个数据点。

For the figure that shows high sensitivity and low specificity, there are 3 FN and 8 FP. Using the fact that positive results = true positives (TP) + FP, we get TP = positive results - FP, or TP = 40 - 8 = 32. The number of sick people in the data set is equal to TP + FN, or 32 + 3 = 35. The sensitivity is therefore 32 / 35 = 91.4%. Using the same method, we get TN = 40 - 3 = 37, and the number of healthy people 37 + 8 = 45, which results in a specificity of 37 / 45 = 82.2 %.
对于显示高敏感性和低特异性的图表,有 3 个 FN 和 8 个 FP。利用正结果=真阳性(TP)+ FP 的事实,我们得到 TP = 正结果 - FP,或者 TP = 40 - 8 = 32。数据集中的患病人数等于 TP + FN,或者 32 + 3 = 35。因此,敏感性为 32/35 = 91.4%。使用相同的方法,我们得到 TN = 40 - 3 = 37,健康人数为 37 + 8 = 45,从而得到特异性为 37/45 = 82.2%。

For the figure that shows low sensitivity and high specificity, there are 8 FN and 3 FP. Using the same method as the previous figure, we get TP = 40 - 3 = 37. The number of sick people is 37 + 8 = 45, which gives a sensitivity of 37 / 45 = 82.2 %. There are 40 - 8 = 32 TN. The specificity therefore comes out to 32 / 35 = 91.4%.
对于显示低敏感性和高特异性的图形,有 8 个 FN 和 3 个 FP。使用与前一个图形相同的方法,我们得到 TP = 40 - 3 = 37。病人的数量是 37 + 8 = 45,这给出了 82.2%的敏感性。有 40 - 8 = 32 个 TN。因此,特异性为 32 / 35 = 91.4%。

The red dot indicates the patient with the medical condition. The red background indicates the area where the test predicts the data point to be positive. The true positive in this figure is 6, and false negatives of 0 (because all positive condition is correctly predicted as positive). Therefore, the sensitivity is 100% (from 6 / (6 + 0)). This situation is also illustrated in the previous figure where the dotted line is at position A (the left-hand side is predicted as negative by the model, the right-hand side is predicted as positive by the model). When the dotted line, test cut-off line, is at position A, the test correctly predicts all the population of the true positive class, but it will fail to correctly identify the data point from the true negative class.
红点表示患有医疗状况的患者。红色背景表示测试预测数据点为阳性的区域。在这个图中,真正的阳性为 6,假阴性为 0(因为所有阳性状况都被正确预测为阳性)。因此,敏感性为 100%(来自 6 / (6 + 0) )。这种情况也在前一个图中说明,虚线位于位置 A(模型将左侧预测为阴性,右侧预测为阳性)。当虚线,即测试截断线,位于位置 A 时,测试能够正确预测出所有真正阳性类别的人群,但无法正确识别来自真正阴性类别的数据点。

Similar to the previously explained figure, the red dot indicates the patient with the medical condition. However, in this case, the green background indicates that the test predicts that all patients are free of the medical condition. The number of data point that is true negative is then 26, and the number of false positives is 0. This result in 100% specificity (from 26 / (26 + 0)). Therefore, sensitivity or specificity alone cannot be used to measure the performance of the test.
与之前解释的图表类似,红点表示患有医疗状况的患者。然而,在这种情况下,绿色背景表示测试预测所有患者都没有医疗状况。真阴性数据点的数量为 26,假阳性的数量为 0。这导致了 100%的特异性(来自 26 / (26 + 0) )。因此,单独使用敏感性或特异性无法衡量测试的性能。

Medical usage 医疗用途[edit]

In medical diagnosis, test sensitivity is the ability of a test to correctly identify those with the disease (true positive rate), whereas test specificity is the ability of the test to correctly identify those without the disease (true negative rate). If 100 patients known to have a disease were tested, and 43 test positive, then the test has 43% sensitivity. If 100 with no disease are tested and 96 return a completely negative result, then the test has 96% specificity. Sensitivity and specificity are prevalence-independent test characteristics, as their values are intrinsic to the test and do not depend on the disease prevalence in the population of interest.[6] Positive and negative predictive values, but not sensitivity or specificity, are values influenced by the prevalence of disease in the population that is being tested. These concepts are illustrated graphically in this applet Bayesian clinical diagnostic model which show the positive and negative predictive values as a function of the prevalence, sensitivity and specificity.
在医学诊断中,测试的敏感性是指测试正确识别出患有疾病的人的能力(真阳性率),而测试的特异性是指测试正确识别出没有患病的人的能力(真阴性率)。如果对已知患有疾病的 100 名患者进行测试,其中 43 人测试结果为阳性,则该测试的敏感性为 43%。如果对没有患病的 100 人进行测试,其中 96 人测试结果完全为阴性,则该测试的特异性为 96%。敏感性和特异性是与患病率无关的测试特征,因为它们的值是固有于测试本身的,不依赖于感兴趣人群中的疾病患病率。阳性和阴性预测值,而不是敏感性或特异性,是受到被测试人群中疾病患病率影响的值。这些概念在这个应用程序中以图形方式展示,贝叶斯临床诊断模型显示了阳性和阴性预测值与患病率、敏感性和特异性之间的关系。

Misconceptions 误解[edit]

It is often claimed that a highly specific test is effective at ruling in a disease when positive, while a highly sensitive test is deemed effective at ruling out a disease when negative.[7][8] This has led to the widely used mnemonics SPPIN and SNNOUT, according to which a highly specific test, when positive, rules in disease (SP-P-IN), and a highly sensitive test, when negative, rules out disease (SN-N-OUT). Both rules of thumb are, however, inferentially misleading, as the diagnostic power of any test is determined by the prevalence of the condition being tested, the test's sensitivity and its specificity.[9][10][11] The SNNOUT mnemonic has some validity when the prevalence of the condition in question is extremely low in the tested sample.
当阳性时,通常声称高度特异性的测试在排除疾病方面有效,而当阴性时,高度敏感的测试被认为在排除疾病方面有效。这导致了广泛使用的记忆法 SPPIN 和 SNNOUT,根据这些记忆法,高度特异性的测试在阳性时排除疾病(SP-P-IN),而高度敏感的测试在阴性时排除疾病(SN-N-OUT)。然而,这两个经验法则在推理上是误导性的,因为任何测试的诊断能力取决于被测试疾病的患病率、测试的敏感性和特异性。当被测试样本中所研究的疾病的患病率极低时,SNNOUT 记忆法有一定的有效性。

The tradeoff between specificity and sensitivity is explored in ROC analysis as a trade off between TPR and FPR (that is, recall and fallout).[12] Giving them equal weight optimizes informedness = specificity + sensitivity − 1 = TPR − FPR, the magnitude of which gives the probability of an informed decision between the two classes (> 0 represents appropriate use of information, 0 represents chance-level performance, < 0 represents perverse use of information).[13]
在 ROC 分析中,特异性和敏感性之间的权衡被探讨为真阳性率(TPR)和假阳性率(FPR)之间的权衡(即召回率和漏报率)。给予它们相等的权重可以优化知情度=特异性+敏感性-1=TPR-FPR,其大小表示在两个类别之间做出知情决策的概率(>0 表示适当使用信息,0 表示机会水平的表现,<0 表示信息的恶意使用)。

Sensitivity index 敏感指数[edit]

The sensitivity index or d′ (pronounced "dee-prime") is a statistic used in signal detection theory. It provides the separation between the means of the signal and the noise distributions, compared against the standard deviation of the noise distribution. For normally distributed signal and noise with mean and standard deviations and , and and , respectively, d′ is defined as:
敏感性指数或 d′(发音为“dee-prime”)是信号检测理论中使用的一种统计量。它提供了信号和噪声分布的均值之间的分离程度,与噪声分布的标准差相比较。对于具有均值和标准差 ,以及 的正态分布的信号和噪声,d′被定义为:

[14]

An estimate of d′ can be also found from measurements of the hit rate and false-alarm rate. It is calculated as:
可以通过命中率和误报率的测量结果来估计 d′。计算方法如下:

d′ = Z(hit rate) − Z(false alarm rate),[15]
d′ = Z(命中率) - Z(误报率), [15]

where function Z(p), p ∈ [0, 1], is the inverse of the cumulative Gaussian distribution.
其中函数 Z(p),p ∈ [0, 1],是累积高斯分布的反函数。

d′ is a dimensionless statistic. A higher d′ indicates that the signal can be more readily detected.
d′是一个无量纲的统计量。较高的 d′表示信号可以更容易地被检测到。

Confusion matrix 混淆矩阵[edit]

The relationship between sensitivity, specificity, and similar terms can be understood using the following table. Consider a group with P positive instances and N negative instances of some condition. The four outcomes can be formulated in a 2×2 contingency table or confusion matrix, as well as derivations of several metrics using the four outcomes, as follows:
敏感性、特异性和类似术语之间的关系可以通过以下表格来理解。考虑一个具有 P 个阳性实例和 N 个阴性实例的群体。可以使用 2×2 的列联表或混淆矩阵来表示四种结果,以及使用这四种结果推导出的几个度量指标,如下所示:

Predicted condition 预测的条件 Sources: [16][17][18][19][20][21][22][23][24]
Total population 总人口
= P + N
= 正 + 负
Predicted Positive (PP) 预测为正 (PP) Predicted Negative (PN) 预测为负面 (PN) Informedness, bookmaker informedness (BM)
知情度,书籍制造商知情度(BM)

= TPR + TNR − 1
= 真正例率 + 真负例率 - 1
Prevalence threshold (PT)
流行阈值 (PT)

= TPR × FPR - FPR/TPR - FPR 真阳性率 - 假阳性率
Actual condition
Positive (P) [a] 积极的 (P) True positive (TP),
真正阳性(TP)

hit[b] 击中
False negative (FN),
假阴性 (FN)

miss, underestimation 错过,低估
True positive rate (TPR), recall, sensitivity (SEN), probability of detection, hit rate, power
真正阳性率(TPR),召回率,敏感性(SEN),检测概率,命中率,功率

= TP/P = 1 − FNR
= 1 - 负样本被错误分类的比例
False negative rate (FNR),
假阴性率 (FNR)

miss rate 错过率
type II error [c] 二类错误
= FN/P = 1 − TPR
= 1 - 真正率
Negative (N)[d] 负面的 (N) [d] False positive (FP),
误报 (FP)

false alarm, overestimation
虚假警报,过高估计
True negative (TN),
真阴性 (TN)

correct rejection[e]
正确拒绝
False positive rate (FPR),
假阳性率 (FPR)

probability of false alarm, fall-out
虚警概率, fall-out

type I error [f] 一型错误
= FP/N = 1 − TNR
= 1 - TNR
True negative rate (TNR),
真阴性率(TNR)

specificity (SPC), selectivity
特异性(SPC),选择性

= TN/N = 1 − FPR
= 1 - 假阳性率
Prevalence 普遍性
= P/P + N 正极 + 负极
Positive predictive value (PPV), precision 精确度
= TP/PP = 1 − FDR
= 1 - FDR
False omission rate (FOR)
虚报遗漏率 (FOR)

= FN/PN = 1 − NPV
= 1 - 净现值
Positive likelihood ratio (LR+)
正似然比(LR+)

= TPR/FPR
Negative likelihood ratio (LR−)
负似然比 (LR−)

= FNR/TNR
Accuracy (ACC)  准确度 (ACC)
= TP + TN 真阳性 + 真阴性/P + N 正极 + 负极
False discovery rate (FDR)
虚假发现率(FDR)

= FP/PP = 1 − PPV
= 1 - 阳性预测值
Negative predictive value (NPV)
负预测值 (NPV)

= TN/PN = 1 − FOR
= 1 - 对于
Markedness (MK), deltaP (Δp)
显著性(MK),deltaP(Δp)

= PPV + NPV − 1
= 正确预测率 + 错误预测率 - 1
Diagnostic odds ratio (DOR)
诊断 odds ratio (DOR)

= LR+/LR−
Balanced accuracy (BA)  平衡准确率(BA)
= TPR + TNR 真阳性率 + 真阴性率/2
F1 score F 分数
= 2 PPV × TPR 2 个 PPV × TPR/PPV + TPR PPV + TPR 的简体中文翻译为:阳性预测值 + 真阳性率 = 2 TP/2 TP + FP + FN
2 真正例+假正例+假负例
Fowlkes–Mallows index (FM)
福尔克斯-马洛斯指数(FM)

= PPV × TPR
Matthews correlation coefficient (MCC)
马修斯相关系数(MCC)

= TPR × TNR × PPV × NPV - FNR × FPR × FOR × FDR
Threat score (TS), critical success index (CSI), Jaccard index
威胁评分(TS),关键成功指数(CSI),Jaccard 指数

= TP/TP + FN + FP
真阳性 + 假阴性 + 假阳性
  1. ^ the number of real positive cases in the data
    数据中真实阳性病例的数量
  2. ^ A test result that correctly indicates the presence of a condition or characteristic
    正确指示存在某种状况或特征的测试结果
  3. ^ Type II error: A test result which wrongly indicates that a particular condition or attribute is absent
    二类错误:测试结果错误地表明特定条件或属性不存在
  4. ^ the number of real negative cases in the data
    数据中真实负例的数量
  5. ^ A test result that correctly indicates the absence of a condition or characteristic
    正确指示某种状况或特征不存在的测试结果
  6. ^ Type I error: A test result which wrongly indicates that a particular condition or attribute is present
    第一类错误:测试结果错误地表明特定条件或属性存在


A worked example 一个实例
A diagnostic test with sensitivity 67% and specificity 91% is applied to 2030 people to look for a disorder with a population prevalence of 1.48%
应用一种敏感性为 67%、特异性为 91%的诊断测试对 2030 人进行检测,以寻找一种人群患病率为 1.48%的疾病。
Fecal occult blood screen test outcome
粪便潜血筛查结果
Total population 总人口
(pop.) = 2030  (人口)= 2030
Test outcome positive 测试结果为阳性 Test outcome negative 测试结果为阴性 Accuracy (ACC)  准确度 (ACC)
= (TP + TN) / pop.
=(真正例数 + 真负例数)/ 总体样本数

= (20 + 1820) / 2030
90.64%
F1 score F 分数
= 2 ×  = 2 × 2precision × recall 精确率 × 召回率/precision + recall 精确率 + 召回率
0.174 ≈ 0.174 约等于 0.174
Patients with 患有疾病的患者
bowel cancer 结肠癌
(as confirmed 如确认
on endoscopy) 内窥镜检查上的)
Actual condition 实际情况
positive (AP) 积极的 (AP)
= 30
(2030 × 1.48%)  29.94
True positive (TP) 真正例 (TP)
= 20
(2030 × 1.48% × 67%)
(2030 × 1.48% × 67%) 的简体中文翻译为:
False negative (FN) 假阴性 (FN)
= 10
(2030 × 1.48% × (100% − 67%))
(2030 × 1.48% × (100% − 67%) ) 翻译为简体中文为:(2030 × 1.48% × (100% − 67%) )
True positive rate (TPR), recall, sensitivity
真正阳性率(TPR),召回率,敏感性
= TP / AP
= 20 / 30
66.7% ≈ 66.7% 相当于约 66.7%
False negative rate (FNR), miss rate
假阴性率(FNR),漏报率

= FN / AP = FN / AP 翻译为简体中文:= FN / AP
= 10 / 30
33.3%
Actual condition 实际情况
negative (AN) 负面的 (AN)
= 2000
(2030 × (100% − 1.48%))
(2030 × (100% − 1.48%) ) = 2030 × (100% − 1.48%)
False positive (FP) 误报
= 180
(2030 × (100% − 1.48%) × (100% − 91%))
(2030 × (100% − 1.48%) × (100% − 91%) ) = 2030 乘以(100% − 1.48%)乘以(100% − 91%)
True negative (TN) 真阴性 (TN)
= 1820
(2030 × (100% − 1.48%) × 91%)
(2030 × 0.91%)
False positive rate (FPR), fall-out, probability of false alarm
假阳性率 (FPR), fall-out , probability of false alarm
= FP / AN = FP / AN 等于 FP 除以 AN
= 180 / 2000
= 9.0%
Specificity, selectivity, true negative rate (TNR)
特异性,选择性,真阴性率(TNR)
= TN / AN
= 1820 / 2000
= 91%
Prevalence 普遍性
= AP / pop. = AP / 人口数量
= 30 / 2030
1.48%
Positive predictive value (PPV), precision
阳性预测值(PPV),精确度
= TP / (TP + FP)
= 真正例 / (真正例 + 假正例)

= 20 / (20 + 180)
= 20 / (20 + 180) = 0.1

= 10%
False omission rate (FOR)
虚报遗漏率 (FOR)
= FN / (FN + TN)
= 10 / (10 + 1820)
= 10 / (10 + 1820) = 0.0054

0.55%
Positive likelihood ratio (LR+)
正似然比 (LR+)
= TPR/FPR
= (20 / 30) / (180 / 2000)
7.41
Negative likelihood ratio (LR−)
负似然比 (LR−)
= FNR/TNR
= (10 / 30) / (1820 / 2000)
0.366 ≈ 0.366 相当于约等于 0.366
False discovery rate (FDR)
虚假发现率 (FDR)
= FP / (TP + FP)
= 180 / (20 + 180)
= 180 / (20 + 180) = 0.9

= 90.0%
Negative predictive value (NPV)
阴性预测值(NPV)
= TN / (FN + TN)
= 真阴性 / (假阴性 + 真阴性)

= 1820 / (10 + 1820)
= 1820 / (10 + 1820) = 0.1652

99.45%
Diagnostic odds ratio (DOR)
诊断比值比(DOR)
= LR+/LR−
20.2

Related calculations 相关计算

  • False positive rate (α) = type I error = 1 − specificity = FP / (FP + TN) = 180 / (180 + 1820) = 9%
    假阳性率(α)= 第一类错误 = 1 - 特异度 = FP / (FP + TN) = 180 / (180 + 1820) = 9%
  • False negative rate (β) = type II error = 1 − sensitivity = FN / (TP + FN) = 10 / (20 + 10) ≈ 33%
    假阴性率(β)= 二类错误 = 1 - 灵敏度 = FN / (TP + FN) = 10 / (20 + 10) ≈ 33%
  • Power = sensitivity = 1 − β
    功率 = 灵敏度 = 1 - β
  • Positive likelihood ratio = sensitivity / (1 − specificity) ≈ 0.67 / (1 − 0.91) ≈ 7.4
    正似然比 = 灵敏度 / (1 - 特异度) ≈ 0.67 / (1 - 0.91) ≈ 7.4
  • Negative likelihood ratio = (1 − sensitivity) / specificity ≈ (1 − 0.67) / 0.91 ≈ 0.37
    负似然比 = (1 - 敏感性) / 特异性 ≈ (1 - 0.67) / 0.91 ≈ 0.37
  • Prevalence threshold = ≈ 0.2686 ≈ 26.9%
    流行阈值 = ≈ 0.2686 ≈ 26.9%

This hypothetical screening test (fecal occult blood test) correctly identified two-thirds (66.7%) of patients with colorectal cancer.[a] Unfortunately, factoring in prevalence rates reveals that this hypothetical test has a high false positive rate, and it does not reliably identify colorectal cancer in the overall population of asymptomatic people (PPV = 10%).
这个假设的筛查测试(隐血试验)能正确识别三分之二(66.7%)的结直肠癌患者。不幸的是,考虑到患病率,这个假设的测试有很高的假阳性率,并且不能可靠地在无症状人群中识别结直肠癌(阳性预测值=10%)。

On the other hand, this hypothetical test demonstrates very accurate detection of cancer-free individuals (NPV ≈ 99.5%). Therefore, when used for routine colorectal cancer screening with asymptomatic adults, a negative result supplies important data for the patient and doctor, such as ruling out cancer as the cause of gastrointestinal symptoms or reassuring patients worried about developing colorectal cancer.
另一方面,这个假设的测试显示出对无癌症个体的非常准确的检测(阴性预测值≈99.5%)。因此,当用于对无症状成年人进行常规结直肠癌筛查时,阴性结果为患者和医生提供了重要的数据,例如排除癌症作为胃肠道症状的原因,或者安抚担心患上结直肠癌的患者。

Estimation of errors in quoted sensitivity or specificity
引用的敏感性或特异性误差的估计
[edit]

Sensitivity and specificity values alone may be highly misleading. The 'worst-case' sensitivity or specificity must be calculated in order to avoid reliance on experiments with few results. For example, a particular test may easily show 100% sensitivity if tested against the gold standard four times, but a single additional test against the gold standard that gave a poor result would imply a sensitivity of only 80%. A common way to do this is to state the binomial proportion confidence interval, often calculated using a Wilson score interval.
敏感性和特异性值本身可能会非常误导。必须计算“最坏情况”的敏感性或特异性,以避免依赖于结果较少的实验。例如,如果一项特定的测试被四次与黄金标准进行测试,很容易显示出 100%的敏感性,但是对黄金标准进行一次额外的测试,结果不好,那么敏感性就只有 80%。常用的方法是陈述二项比例置信区间,通常使用威尔逊得分区间进行计算。

Confidence intervals for sensitivity and specificity can be calculated, giving the range of values within which the correct value lies at a given confidence level (e.g., 95%).[27]
可以计算灵敏度和特异度的置信区间,给出在给定置信水平(例如 95%)下正确值所在的范围。

Terminology in information retrieval
信息检索中的术语
[edit]

In information retrieval, the positive predictive value is called precision, and sensitivity is called recall. Unlike the Specificity vs Sensitivity tradeoff, these measures are both independent of the number of true negatives, which is generally unknown and much larger than the actual numbers of relevant and retrieved documents. This assumption of very large numbers of true negatives versus positives is rare in other applications.[13]
在信息检索中,正预测值被称为精确度,而敏感度被称为召回率。与特异性与敏感性的权衡不同,这些度量指标都不依赖于真阴性的数量,而真阴性的数量通常是未知的,并且远远大于相关和检索到的文档的实际数量。这种假设真阴性数量远大于阳性的情况在其他应用中很少见。

The F-score can be used as a single measure of performance of the test for the positive class. The F-score is the harmonic mean of precision and recall:
F-score 可以用作对正类测试性能的单一度量。F-score 是精确率和召回率的调和平均数。

In the traditional language of statistical hypothesis testing, the sensitivity of a test is called the statistical power of the test, although the word power in that context has a more general usage that is not applicable in the present context. A sensitive test will have fewer Type II errors.
在传统的统计假设检验语言中,测试的敏感性被称为测试的统计功效,尽管在这个背景下,功效一词有一个更普遍的用法,不适用于当前的背景。敏感的测试将会有较少的二类错误。

Terminology in genome analysis
基因组分析中的术语
[edit]

Similarly to the domain of information retrieval, in the research area of gene prediction, the number of true negatives (non-genes) in genomic sequences is generally unknown and much larger than the actual number of genes (true positives). The convenient and intuitively understood term specificity in this research area has been frequently used with the mathematical formula for precision and recallprecision as defined in biostatistics. The pair of thus defined specificity (as positive predictive value) and sensitivity (true positive rate) represent major parameters characterizing the accuracy of gene prediction algorithms. [28] [29] [30] [31] Conversely, the term specificity in a sense of true negative rate would have little, if any, application in the genome analysis research area.
与信息检索领域类似,在基因预测的研究领域中,基因组序列中真阴性(非基因)的数量通常是未知的,并且远远大于实际基因的数量(真阳性)。在这个研究领域中,方便和直观理解的术语特异性经常与生物统计学中定义的精确度和召回率的数学公式一起使用。因此定义的特异性(作为阳性预测值)和敏感性(真阳性率)这一对参数代表了基因预测算法准确性的主要特征。相反,在基因组分析研究领域中,以真阴性率的意义来理解特异性几乎没有任何应用。

See also 另请参阅[edit]

Notes 笔记[edit]

  1. ^ There are advantages and disadvantages for all medical screening tests. Clinical practice guidelines, such as those for colorectal cancer screening, describe these risks and benefits.[25][26]
    所有医学筛查测试都有优点和缺点。临床实践指南,比如结直肠癌筛查指南,描述了这些风险和好处。

References 参考资料[edit]

  1. ^ Yerushalmy J (1947). "Statistical problems in assessing methods of medical diagnosis with special reference to x-ray techniques". Public Health Reports. 62 (2): 1432–39. doi:10.2307/4586294 . JSTOR 4586294. PMID 20340527 . S2CID 19967899.
  2. ^ Saah AJ, Hoover DR (1998). "[Sensitivity and specificity revisited: significance of the terms in analytic and diagnostic language]". Ann Dermatol Venereol. 125 (4): 291–4. PMID 9747274 IF: 0.9 Q4 .
  3. ^ Parikh R, Mathai A, Parikh S, Chandra Sekhar G, Thomas R (2008). "Understanding and using sensitivity, specificity and predictive values". Indian Journal of Ophthalmology. 56 (1): 45–50. doi:10.4103/0301-4738.37595 IF: 3.1 Q2 . PMC 2636062. PMID 18158403 IF: 3.1 Q2 .
  4. ^ Jump up to: a b Altman DG, Bland JM (June 1994). "Diagnostic tests. 1: Sensitivity and specificity". BMJ. 308 (6943): 1552. doi:10.1136/bmj.308.6943.1552 IF: 105.7 Q1 . PMC 2540489. PMID 8019315 IF: 105.7 Q1 .
  5. ^ "SpPin and SnNout". Centre for Evidence Based Medicine (CEBM). Retrieved 18 January 2023.
    "SpPin 和 SnNout"。循证医学中心(CEBM)。检索于 2023 年。
  6. ^ Mangrulkar R. "Diagnostic Reasoning I and II". Retrieved 24 January 2012.
    Mangrulkar R. "诊断推理 I 和 II". 检索于 2012 年。
  7. ^ "Evidence-Based Diagnosis". Michigan State University. Archived from the original on 2013-07-06. Retrieved 2013-08-23.
    "基于证据的诊断"。密歇根州立大学。存档自 2013-07-06 的原始网页。检索到 2013-08-23
  8. ^ "Sensitivity and Specificity". Emory University Medical School Evidence Based Medicine course.
    敏感性和特异性。埃默里大学医学院循证医学课程。
  9. ^ Baron JA (Apr–Jun 1994). "Too bad it isn't true". Medical Decision Making. 14 (2): 107. doi:10.1177/0272989X9401400202 IF: 3.6 Q2 . PMID 8028462 IF: 3.6 Q2 . S2CID 44505648.
  10. ^ Boyko EJ (Apr–Jun 1994). "Ruling out or ruling in disease with the most sensitive or specific diagnostic test: short cut or wrong turn?". Medical Decision Making. 14 (2): 175–9. doi:10.1177/0272989X9401400210 IF: 3.6 Q2 . PMID 8028470 IF: 3.6 Q2 . S2CID 31400167.
  11. ^ Pewsner D, Battaglia M, Minder C, Marx A, Bucher HC, Egger M (July 2004). "Ruling a diagnosis in or out with "SpPIn" and "SnNOut": a note of caution". BMJ. 329 (7459): 209–13. doi:10.1136/bmj.329.7459.209 IF: 105.7 Q1 . PMC 487735. PMID 15271832 IF: 105.7 Q1 .
  12. ^ Fawcett T (2006). "An Introduction to ROC Analysis". Pattern Recognition Letters. 27 (8): 861–874. Bibcode:2006PaReL..27..861F. CiteSeerX 10.1.1.646.2144. doi:10.1016/j.patrec.2005.10.010 IF: 5.1 Q2 . S2CID 2027090.
  13. ^ Jump up to: a b Powers DM (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies. 2 (1): 37–63.
    Powers DM(2011 年)。《评估:从精确度、召回率和 F-度量到 ROC、信息度、标记度和相关性》。《机器学习技术杂志》。2(1):37-63。
  14. ^ Gale SD, Perkel DJ (January 2010). "A basal ganglia pathway drives selective auditory responses in songbird dopaminergic neurons via disinhibition". The Journal of Neuroscience. 30 (3): 1027–37. doi:10.1523/JNEUROSCI.3585-09.2010 IF: 5.3 Q1 . PMC 2824341. PMID 20089911 IF: 5.3 Q1 .
  15. ^ Macmillan NA, Creelman CD (15 September 2004). Detection Theory: A User's Guide. Psychology Press. p. 7. ISBN 978-1-4106-1114-7.
    Macmillan NA, Creelman CD (2004 年 9 月 15 日). 检测理论:用户指南. 心理学出版社. 第 7 页. ISBN 978-1-4106-1114-7.
  16. ^ Balayla J (2020). "Prevalence threshold (ϕe) and the geometry of screening curves". PLOS ONE. 15 (10): e0240215. doi:10.1371/journal.pone.0240215 IF: 3.7 Q2 . PMID 33027310 IF: 3.7 Q2 .
  17. ^ Fawcett T (2006). "An Introduction to ROC Analysis" (PDF). Pattern Recognition Letters. 27 (8): 861–874. doi:10.1016/j.patrec.2005.10.010 IF: 5.1 Q2 . S2CID 2027090.
  18. ^ Piryonesi S. Madeh, El-Diraby Tamer E. (2020-03-01). "Data Analytics in Asset Management: Cost-Effective Prediction of the Pavement Condition Index". Journal of Infrastructure Systems. 26 (1): 04019036. doi:10.1061/(ASCE)IS.1943-555X.0000512 IF: 3.3 Q2 . S2CID 213782055.
  19. ^ Powers DM (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies. 2 (1): 37–63.
    Powers DM(2011 年)。《评估:从精确度、召回率和 F-度量到 ROC、信息度、标记度和相关性》。《机器学习技术杂志》。2(1):37-63。
  20. ^ Ting KM (2011). Sammut C, Webb GI (eds.). Encyclopedia of machine learning. Springer. doi:10.1007/978-0-387-30164-8 . ISBN 978-0-387-30164-8.
  21. ^ Brooks H, Brown B, Ebert B, Ferro C, Jolliffe I, Koh TY, Roebber P, Stephenson D (2015-01-26). "WWRP/WGNE Joint Working Group on Forecast Verification Research". Collaboration for Australian Weather and Climate Research. World Meteorological Organisation. Retrieved 2019-07-17.
    布鲁克斯 H,布朗 B,埃伯特 B,费罗 C,乔利夫 I,科 T.Y.,罗伯 P,斯蒂芬森 D(2015-01-26)。《WWRP/WGNE 联合工作组关于预报验证研究》。澳大利亚天气和气候研究合作。世界气象组织。检索到 2019-07-17
  22. ^ Chicco D, Jurman G (January 2020). "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation". BMC Genomics. 21 (1): 6-1–6-13. doi:10.1186/s12864-019-6413-7 IF: 4.4 Q1 . PMC 6941312. PMID 31898477 IF: 4.4 Q1 .
  23. ^ Chicco D, Toetsch N, Jurman G (February 2021). "The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation". BioData Mining. 14 (13): 13. doi:10.1186/s13040-021-00244-z IF: 4.5 Q1 . PMC 7863449. PMID 33541410 IF: 4.5 Q1 .
  24. ^ Tharwat A. (August 2018). "Classification assessment methods". Applied Computing and Informatics. 17: 168–192. doi:10.1016/j.aci.2018.08.003 .
    Tharwat A.(2018 年 8 月)。"分类评估方法"。应用计算与信息学。17:168-192。doi:10.1016/j.aci.2018.08.003
  25. ^ Lin JS, Piper MA, Perdue LA, Rutter CM, Webber EM, O'Connor E, Smith N, Whitlock EP (21 June 2016). "Screening for Colorectal Cancer". JAMA. 315 (23): 2576–2594. doi:10.1001/jama.2016.3332 IF: 120.7 Q1 . ISSN 0098-7484. PMID 27305422 IF: 120.7 Q1 .
  26. ^ Bénard F, Barkun AN, Martel M, Renteln Dv (7 January 2018). "Systematic review of colorectal cancer screening guidelines for average-risk adults: Summarizing the current global recommendations". World Journal of Gastroenterology. 24 (1): 124–138. doi:10.3748/wjg.v24.i1.124 IF: 4.3 Q2 . PMC 5757117. PMID 29358889 IF: 4.3 Q2 .
  27. ^ "Diagnostic test online calculator calculates sensitivity, specificity, likelihood ratios and predictive values from a 2x2 table – calculator of confidence intervals for predictive parameters". medcalc.org.
    诊断测试在线计算器从 2x2 表中计算敏感性、特异性、似然比和预测值 - 预测参数的置信区间计算器。medcalc.org。
  28. ^ Burge C, Karlin S (1997). "Prediction of complete gene structures in human genomic DNA" (PDF). Journal of Molecular Biology. 268 (1): 78–94. CiteSeerX 10.1.1.115.3107. doi:10.1006/jmbi.1997.0951 IF: 5.6 Q1 . PMID 9149143 IF: 5.6 Q1 . Archived from the original (PDF) on 2015-06-20.
  29. ^ "GeneMark-ES". Lomsadze A (2005). "Gene finding in novel genomes by self-training algorithm". Nucleic Acids Research. 33 (20): 6494–6906. doi:10.1093/nar/gki937 IF: 14.9 Q1 . PMC 1298918. PMID 16314312 IF: 14.9 Q1 .
  30. ^ Korf I (2004). "Gene finding in novel genomes". BMC Bioinformatics. 5: 59. doi:10.1186/1471-2105-5-59 IF: 3.0 Q2 . PMC 421630. PMID 15144565 IF: 3.0 Q2 .
  31. ^ Yandell M, Ence D (April 2012). "A beginner's guide to eukaryotic genome annotation". Nature Reviews. Genetics. 13 (5): 329–42. doi:10.1038/nrg3174 IF: 42.7 Q1 . PMID 22510764 IF: 42.7 Q1 . S2CID 3352427.

Further reading 进一步阅读[edit]

External links 外部链接[edit]