Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations
深度解析前馈神经网络,第一部分:前向传播与反向传播
This post is the first of a three-part series in which we set out to derive the mathematics behind feedforward neural networks. They have
本文是三部曲系列的第一篇,旨在推导前馈神经网络背后的数学原理。
- an input and an output layer with at least one hidden layer in between,
一个输入层和一个输出层,中间至少有一个隐藏层, - fully-connected layers, which means that each node in one layer connects to every node in the following layer, and
全连接层,意味着一层中的每个节点都与下一层的所有节点相连,并且 - ways to introduce nonlinearity by means of activation functions.
通过激活函数引入非线性的方法。
We start with forward propagation, which involves computing predictions and the associated cost of these predictions.
我们首先进行前向传播,这涉及计算预测值及其相应的成本。
Forward Propagation 前向传播
Settling on what notations to use is tricky since we only have so many letters in the Roman alphabet. As you browse the Internet, you will likely find derivations that have used different notations than the ones we are about to introduce. However, and fortunately, there is no right or wrong here; it is just a matter of taste. In particular, the notations used in this series take inspiration from Andrew Ng’s Standard notations for Deep Learning. If you make a comparison, you will find that we only change a couple of the details.
选择使用哪些符号颇具挑战,因为罗马字母数量有限。浏览互联网时,你可能会发现与我们即将介绍的符号不同的衍生用法。然而,幸运的是,这里并无对错之分,仅关乎个人偏好。特别是,本系列所采用的符号灵感源自 Andrew Ng 的深度学习标准符号。若进行对比,你会发现我们仅对细节做了少许改动。
Now, whatever we come up with, we have to support
现在,无论我们提出什么,我们都必须支持
- multiple layers, 多层结构
- several nodes in each layer,
每层中的若干节点, - various activation functions,
各种激活函数 - various types of cost functions, and
各种类型的成本函数,以及 - mini-batches of training examples.
训练示例的小批量。
As a result, our definition of a node ends up introducing a fairly large number of notations:
因此,我们对节点的定义最终引入了相当多的符号表示:
Does the node definition look intimidating to you at first glance? Do not worry. Hopefully, it will make more sense once we have explained the notations, which we shall do next:
节点定义乍看之下是否让你感到畏惧?不必担心。希望在我们解释了符号之后,它将变得更加清晰,接下来我们将进行这一步骤:
Entity 实体 | Description 描述 |
---|---|
The current layer 当前层 |
|
The number of nodes in the current layer. 当前层的节点数。 |
|
The number of nodes in the previous layer. 前一层中的节点数量。 |
|
The 当前层的第 |
|
The 前一层的第 |
|
The current training example 当前训练示例 |
|
A weighted sum of the activations of the previous layer, shifted by a bias. 前一层激活值的加权和,经偏置调整。 |
|
A weight that scales the 一个权重,用于调整前一层第 |
|
A bias in the current layer. 当前层的偏差。 |
|
An activation in the current layer. 当前层的激活。 |
|
An activation in the previous layer. 上一层的激活。 |
|
An activation function 当前层使用的激活函数 |
To put it concisely, a node in the current layer depends on every node in the previous layer, and the following visualization can help us see that more clearly:
简而言之,当前层的节点依赖于前一层的每个节点,下面的可视化图将帮助我们更清晰地理解这一点:
Moreover, a node in the previous layer affects every node in the current layer, and with a change in highlighting, we will also be able to see that more clearly:
此外,前一层的节点会影响当前层的每个节点,通过改变突出显示方式,我们也将更清楚地看到这一点:
In the future, we might want to write an implement from scratch in, for example, Python. To take advantage of the heavily optimized versions of vector and matrix operations that come bundled with libraries such as NumPy, we need to vectorize
未来,我们可能希望从头开始编写一个实现,例如使用 Python。为了充分利用与 NumPy 等库捆绑在一起的向量和矩阵操作的高度优化版本,我们需要对
To begin with, we vectorize the nodes:
首先,我们对节点进行向量化:
which we can write as
我们可以将其写作
where
其中
Next, we vectorize the training examples:
接下来,我们将训练样本向量化:
where
其中
We would also like to establish two additional notations:
我们还希望建立两个额外的符号表示:
where
其中
Finally, we are ready to define the cost function:
最后,我们准备定义成本函数:
where
其中
We are done with forward propagation! Next up: backward propagation, also known as backpropagation, which involves computing the gradient of the cost function with respect to the weights and biases.
我们已完成前向传播!接下来:反向传播,亦称反向传播算法,涉及计算成本函数相对于权重和偏置的梯度。
Backward Propagation 反向传播
We will make heavy use of the chain rule in this section, and to understand better how it works, we first apply the chain rule to the following example:
本节中我们将大量运用链式法则,为了更好地理解其工作原理,我们首先将链式法则应用于以下示例:
Note that
请注意,
Great! If we ever get stuck trying to compute or understand some partial derivative, we can always go back to
太好了!如果在计算或理解某些偏导数时遇到困难,我们总可以回过头来参考
Now, let us concentrate on the task at hand:
现在,让我们专注于当前的任务:
Vectorization results in
向量化结果为
which we can write as
我们可以将其写作
where
其中
Looking back at
回顾
where
Next, we present the vectorized version:
接下来,我们展示向量化版本:
which compresses into 压缩成
where
在
We have already encountered
我们已有所接触
and for the sake of completeness, we also clarify that
为了完整性起见,我们也明确指出
where
On purpose, we have omitted the details of
出于特定目的,我们省略了
Furthermore, according to
此外,根据
As usual, our next step is vectorization:
如常,我们的下一步是向量化:
which we can write as
我们可以将其写作
where
Summary 摘要
Forward propagation is seeded with
正向传播以
Backward propagation, on the other hand, is seeded with
另一方面,反向传播以
Moreover, let us visualize the inputs we use and the outputs we produce during the forward and backward propagations:
此外,让我们设想一下在正向和反向传播过程中使用的输入和产生的输出:
Now, you might have noticed that we have yet to derive an analytic expression for the backpropagation seed
现在,您可能已经注意到,我们尚未推导出反向传播种子
Last but not least: congratulations! You have made it to the end (of the first post). 🏅
最后但同样重要的是:恭喜!你已经完成了第一篇文章的阅读。🏅