Gaussian_Process_in_Machine_Learning

Symbols and Notation 符号与表示法

Matrices are capitalized and vectors are in bold type. We do not generally distinguish between probabilities and probability densities. A subscript asterisk,such as in

X_{*}

,indicates reference to a test set quantity. A superscript asterisk denotes complex conjugate.
矩阵用大写字母表示，向量以粗体显示。我们通常不区分概率与概率密度。下标星号（如

X_{*}

）表示测试集相关量，上标星号代表复共轭。

Symbol Meaning 符号含义

\ left matrix divide:

A ∖ b

is the vector

x

which solves

A x = b

左矩阵除法：

A ∖ b

是向量

x

的解，满足

A x = b

△ an equality which acts as a definition
△ 作为定义的等式

\overset{c}{=}

equality up to an additive constant

\overset{c}{=}

等式相差一个加法常数

\( K	\) determinant of $K$ matrix \) 矩阵 $K$ 的行列式

y Euclidean length of vector

y

,i.e.

{(\sum_{i} y_{i}^{2})}^{1 / 2}

y 向量

y

的欧几里得长度，即

{(\sum_{i} y_{i}^{2})}^{1 / 2}

⟨ f, g ⟩_{H}

RKHS inner product

⟨ f, g ⟩_{H}

的再生核希尔伯特空间内积

∥ f ∥_{H}

RKHS norm

∥ f ∥_{H}

的再生核希尔伯特空间范数

y^{⊤}

the transpose of vector

y

y^{⊤}

向量

y

的转置

✘ proportional to; e.g.

p (x ∣ y) \propto f (x, y)

means that

p (x ∣ y)

is equal to

f (x, y)

times
✘ 正比于；例如

p (x ∣ y) \propto f (x, y)

表示

p (x ∣ y)

等于

f (x, y)

乘以

a factor which is independent of

x

与

x

无关的因子

～ distributed according to; example:

x \sim N (μ, σ^{2})

～服从...分布；示例：

x \sim N (μ, σ^{2})

\nabla

\nabla_{f}

partial derivatives (w.r.t. f)

\nabla

或

\nabla_{f}

关于 f 的偏导数

VV the (Hessian) matrix of second derivatives
VV 二阶导数的（海森）矩阵

0_{- n}

vector of all 0 ’s (of length

n

)

0_{- n}

全零向量（长度为

n

）

1 or

1_{n}

vector of all 1 ’s (of length

n

)
1 或

1_{n}

全 1 向量（长度为

n

）

Cnumber of classes in a classification problem
分类问题中的类别数量

cholesky(A) Cholesky decomposition:

L

is a lower triangular matrix such that

L L^{⊤} = A

cholesky(A) 楚列斯基分解：

L

是一个下三角矩阵，满足

L L^{⊤} = A

cov (f_{*})

Gaussian process posterior covariance

cov (f_{*})

高斯过程后验协方差

Ddimension of input space

X

输入空间维度

X

Ddata set:

D = {(x_{i}, y_{i}) ∣ i = 1, \dots, n}

数据集：

D = {(x_{i}, y_{i}) ∣ i = 1, \dots, n}

diag(w) (vector argument) a diagonal matrix containing the elements of vector

w

diag(w)（向量参数）一个对角矩阵，其元素由向量

w

组成

diag (W)

(matrix argument) a vector containing the diagonal elements of matrix

W

diag (W)

（矩阵参数）一个向量，包含矩阵

W

的对角线元素

δ_{p q}

Kronecker delta,

δ_{p q} = 1

iff

p = q

and 0 otherwise

δ_{p q}

克罗内克δ函数，当

δ_{p q} = 1

时等于

p = q

，否则为 0

E

E_{q (x)} [z (x)]

expectation; expectation of

z (x)

when

x \sim q (x)

E

或

E_{q (x)} [z (x)]

期望；当

x \sim q (x)

时

z (x)

的期望

f (x)

f

Gaussian process (or vector of) latent function values,

f = {(f (x_{1}), \dots, f (x_{n}))}^{⊤}

f (x)

或

f

高斯过程（或向量）的潜在函数值，

f = {(f (x_{1}), \dots, f (x_{n}))}^{⊤}

f_{*}

Gaussian process (posterior) prediction (random variable)

f_{*}

高斯过程（后验）预测（随机变量）

{\overset{―}{f}}_{*}

Gaussian process posterior mean

{\overset{―}{f}}_{*}

高斯过程后验均值

GP

Gaussian process:

f \sim G P (m (x), k (x, x^{'}))

,the function

f

is distributed as a
高斯过程：

f \sim G P (m (x), k (x, x^{'}))

，函数

f

服从以

Gaussian process with mean function

m (x)

and covariance function

k (x, x^{'})

高斯过程分布，其均值函数为

m (x)

，协方差函数为

k (x, x^{'})

h (x)

h (x)

either fixed basis function (or set of basis functions) or weight function

h (x)

或

h (x)

可以是固定基函数（或一组基函数）或权重函数

H

H (X)

set of basis functions evaluated at all training points

H

或

H (X)

表示在所有训练点处评估得到的一组基函数

I

I_{n}

the identity matrix (of size

n

)

I

或

I_{n}

单位矩阵（大小为

n

）

J_{ν} (z)

Bessel function of the first kind

J_{ν} (z)

第一类贝塞尔函数

k (x, x^{'})

covariance (or kernel) function evaluated at

x

and

x^{'}

k (x, x^{'})

在

x

和

x^{'}

处评估的协方差（或核）函数

K

K (X, X)

n \times n

covariance (or Gram) matrix

K

或

K (X, X)

n \times n

协方差（或 Gram）矩阵

K_{*}

n \times n_{*}

matrix

K (X, X_{*})

,the covariance between training and test cases

K_{*}

n \times n_{*}

矩阵

K (X, X_{*})

，训练集与测试集之间的协方差

k (x_{*})

k_{*}

vector,short for

K (X, x_{*})

,when there is only a single test case

k (x_{*})

或

k_{*}

向量，

K (X, x_{*})

的简称，当仅存在单一测试用例时

K_{f}

K

covariance matrix for the (noise free)

f

values

K_{f}

或

K

（无噪声）

f

值的协方差矩阵

C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, ISBN 026218253X. (c) 2006 Massachusetts Institute of Technology. www. GaussianProcess.org/gpml
C. E. Rasmussen 与 C. K. I. Williams 合著，《机器学习中的高斯过程》，麻省理工学院出版社，2006 年，ISBN 026218253X。版权所有©2006 麻省理工学院。网址：www.GaussianProcess.org/gpml

xviii 十八

Symbol Meaning 符号含义

K_{y}

covariance matrix for the (noisy)

y

values; for independent homoscedastic noise,

K_{y}

表示（含噪声的）

y

值的协方差矩阵；适用于独立同方差噪声情况

K_{y} = K_{f} + σ_{n}^{2} I

K_{ν} (z)

modified Bessel function

K_{ν} (z)

修正贝塞尔函数

L (a, b)

loss function,the loss of predicting

b

,when

a

is true; note argument order

L (a, b)

损失函数，当真实值为

a

时，预测

b

所产生的损失；注意参数顺序

\log (z)

natural logarithm (base

e

)

\log (z)

自然对数（底数为

e

）

\log_{2} (z)

logarithm to the base 2

\log_{2} (z)

以 2 为底的对数

ℓ

ℓ_{d}

characteristic length-scale (for input dimension

d

)

ℓ

或

ℓ_{d}

特征长度尺度（针对输入维度

d

）

λ (z)

logistic function,

λ (z) = 1 / (1 + \exp (- z))

λ (z)

逻辑函数，

λ (z) = 1 / (1 + \exp (- z))

m (x)

the mean function of a Gaussian process
高斯过程的均值函数

μ

a measure (see section A.7)
一种测度（参见附录 A.7）

N (μ, \sum)

N (x ∣ μ, \sum)

(the variable

x

has a) Gaussian (Normal) distribution with mean vector

μ

and
或

N (x ∣ μ, \sum)

（变量

x

服从）均值为向量

μ

的（正态）高斯分布

covariance matrix

\sum

协方差矩阵

\sum

N (x)

short for unit Gaussian

x \sim N (0, I)

N (x)

单位高斯的简称

x \sim N (0, I)

n

and

n_{*}

number of training (and test) cases

n

和

n_{*}

训练（及测试）样本的数量

Ndimension of feature space
特征空间的维度

N_{H}

number of hidden units in a neural network

N_{H}

神经网络中隐藏单元的数量

Nthe natural numbers, the positive integers
N 自然数，即正整数

O (\cdot)

big Oh; for functions

f

and

g

N

,we write

f (n) = O (g (n))

if the ratio

O (\cdot)

大 O 符号；对于定义在

N

上的函数

f

和

g

，我们记作

f (n) = O (g (n))

，如果比值

f (n) / g (n)

remains bounded as

n \to \infty

f (n) / g (n)

在

n \to \infty

时保持有界

Oeither matrix of all zeros or differential operator
全零矩阵或微分算子

y ∣ x

and

p (y ∣ x)

conditional random variable

y

given

x

and its probability (density)

y ∣ x

和

p (y ∣ x)

的条件随机变量

y

给定

x

及其概率（密度）

P_{N}

the regular

n

-polygon

P_{N}

规则

n

-多边形

ϕ (x_{i})

Φ (X)

feature map of input

x_{i}

(or input set

X

)

ϕ (x_{i})

或

Φ (X)

输入

x_{i}

（或输入集

X

）的特征映射

Φ (z)

cumulative unit Gaussian:

Φ (z) = {(2 π)}^{- 1 / 2} \int_{- \infty}^{z} \exp (- t^{2} / 2) d t

Φ (z)

累积单位高斯：

Φ (z) = {(2 π)}^{- 1 / 2} \int_{- \infty}^{z} \exp (- t^{2} / 2) d t

π (x)

the sigmoid of the latent value:

π (x) = σ (f (x))

(stochastic if

f (x)

is stochastic)
潜在值的 Sigmoid 函数：

π (x) = σ (f (x))

（若

f (x)

为随机变量，则结果亦随机）

\hat{π} (x_{*})

MAP prediction:

π

evaluated at

\bar{f} (x_{*})

.
MAP 预测：在

\bar{f} (x_{*})

处评估的

π

。

\bar{π} (x_{*})

mean prediction: expected value of

π (x_{*})

. Note,in general that

\hat{π} (x_{*}) \neq \bar{π} (x_{*})

均值预测：

π (x_{*})

的期望值。注意，通常而言

\hat{π} (x_{*}) \neq \bar{π} (x_{*})

Rthe real numbers R 实数集

R_{L} (f)

R_{L} (c)

the risk or expected loss for

f

,or classifier

c

(averaged w.r.t. inputs and outputs)

R_{L} (f)

或

R_{L} (c)

针对

f

的风险或预期损失，或分类器

c

（相对于输入和输出的平均值）

{\tilde{R}}_{L} (l ∣ x_{*})

expected loss for predicting

l

,averaged w.r.t. the model’s pred. distr. at

x_{*}

预测

l

的期望损失，相对于模型在

x_{*}

处的预测分布取平均

R_{c}

decision region for class

c

类别

c

的决策区域

S (s)

power spectrum

S (s)

功率谱

σ (z)

any sigmoid function,e.g. logistic

λ (z)

,cumulative Gaussian

Φ (z)

,etc.

σ (z)

任意 S 型函数，例如逻辑函数

λ (z)

、累积高斯函数

Φ (z)

等。

σ_{f}^{2}

variance of the (noise free) signal

σ_{f}^{2}

（无噪声）信号的方差

σ_{n}^{2}

noise variance

σ_{n}^{2}

噪声方差

θ

vector of hyperparameters (parameters of the covariance function)

θ

超参数向量（协方差函数的参数）

tr (A)

trace of (square) matrix

A

tr (A)

（方阵的）迹

A

T_{l}

the circle with circumference

l

T_{l}

周长为

l

的圆

V

V_{q (x)} [z (x)]

variance; variance of

z (x)

when

x \sim q (x)

V

或

V_{q (x)} [z (x)]

方差；当

x \sim q (x)

时

z (x)

的方差

Xinput space and also the index set for the stochastic process
输入空间 X，同时也是随机过程的索引集

D \times n

matrix of the training inputs

{x_{i}}_{i = 1}^{n}

: the design matrix
X

D \times n

训练输入矩阵

{x_{i}}_{i = 1}^{n}

：设计矩阵

X_{*}

matrix of test inputs

X_{*}

测试输入矩阵

x_{i}

the

i

th training input

x_{i}

第

i

个训练输入

x_{d i}

the

d

th coordinate of the

i

th training input

x_{i}

x_{d i}

第

i

个训练输入

x_{i}

的第

d

个坐标

Zthe integers

\dots, - 2, - 1, 0, 1, 2, \dots

整数

\dots, - 2, - 1, 0, 1, 2, \dots