损失函数L对全连接层W、X、b的梯度
假设:X的维度为 s × n s\times n s×n,其中s为样本数,每个样本均展平为 1 × n 1\times n 1×n的行向量;W维度为 n × o n\times o n×o,其中o为全连接层的输出维度;b维度为 1 × o 1\times o 1×o;Z维度为 s × o s\times o s×o,并且 Z = X × W + b Z=X\times W+b Z=X×W+b
Z = [ z 11 … z 1 o … z s 1 … z s o ] Z=\begin{bmatrix} z_{11} & … & z_{1o} \\ & … & \\ z_{s1} & … & z_{so} \end{bmatrix} Z= z11zs1………z1ozso
X = [ x 11 … x 1 n … x s 1 … x s n ] X=\begin{bmatrix} x_{11} & … & x_{1n} \\ & … & \\ x_{s1} & … & x_{sn} \end{bmatrix} X= x11xs1………x1nxsn
W = [ w 11 … w 1 o … w n 1 … w n o ] W=\begin{bmatrix} w_{11} & … & w_{1o} \\ & … & \\ w_{n1} & … & w_{no} \end{bmatrix} W= w11wn1………w1owno
b = [ b 1 … b o ] b=\begin{bmatrix} b_{1} & … & b_{o} \\ \end{bmatrix} b=[b1…bo]
损失函数L对W的梯度:
∂ L ∂ W = X T × ∂ L ∂ Z \frac{\partial L}{\partial W}=X^T\times\frac{\partial L}{\partial Z} ∂W∂L=XT×∂Z∂L
证明: \textbf{证明:} 证明:
因为
z i j = ∑ t = 1 n x i t w t j + b j z_{ij}=\sum_{t=1}^n{x_{it}w_{tj}+b_j} zij=t=1∑nxitwtj+bj
所以 ∂ z i j ∂ w k l = { x i k , 如果 j = l 0 , 如果 j ≠ l \frac{\partial z_{ij}}{\partial w_{kl}}= \left\{ \begin{aligned} & x_{ik}, && \text{如果 } j=l \\ & 0, && \text{如果 } j \neq l \end{aligned} \right. ∂wkl∂zij={xik,0,如果 j=l如果 j=l
所以
∂ L ∂ w k l = ∑ i = 1 s ∑ j = 1 o ∂ L ∂ z i j ∂ z i j ∂ w k l = ∑ i = 1 s ∂ L ∂ z i l x i k \frac{\partial L}{\partial w_{kl}} = {\sum_{i=1}^s \sum_{j=1}^{o} \frac{\partial L}{\partial z_{ij}} \frac{\partial z_{ij}}{\partial w_{kl}}} =\sum_{i=1}^s\frac{\partial L}{\partial z_{il}}x_{ik} ∂wkl∂L=i=1∑sj=1∑o∂zij∂L∂wkl∂zij=i=1∑s∂zil∂Lxik
又因为
X T × ∂ L ∂ Z = [ x 11 … x s 1 … x 1 n … x s n ] × [ ∂ L ∂ z 11 … ∂ L ∂ z 1 o … ∂ L ∂ z s 1 … ∂ L ∂ z s o ] X^T\times\frac{\partial L}{\partial Z}= \begin{bmatrix} x_{11} & … & x_{s1} \\ & … & \\ x_{1n} & … & x_{sn} \\ \end{bmatrix} \times \begin{bmatrix} \frac{\partial L}{\partial z_{11}} & … & \frac{\partial L}{\partial z_{1o}} \\ & … & \\ \frac{\partial L}{\partial z_{s1}} & … & \frac{\partial L}{\partial z_{so}} \end{bmatrix} XT×∂Z∂L= x11x1n………xs1xsn × ∂z11∂L∂zs1∂L………∂z1o∂L∂zso∂L
所以
( X T × ∂ L ∂ Z ) k l = ∑ i = 1 s ∂ L ∂ z i l x i k = ∂ L ∂ w k l (X^T\times\frac{\partial L}{\partial Z})_{kl}=\sum_{i=1}^s\frac{\partial L}{\partial z_{il}}x_{ik}=\frac{\partial L}{\partial w_{kl}} (XT×∂Z∂L)kl=i=1∑s∂zil∂Lxik=∂wkl∂L
所以
∂ L ∂ W = X T × ∂ L ∂ Z \frac{\partial L}{\partial W}=X^T\times\frac{\partial L}{\partial Z} ∂W∂L=XT×∂Z∂L
损失函数L对X的梯度:
∂ L ∂ X = ∂ L ∂ Z × W T \frac{\partial L}{\partial X}=\frac{\partial L}{\partial Z}\times W^T ∂X∂L=∂Z∂L×WT
证明: \textbf{证明:} 证明:
∂ L ∂ x i j = ∑ l = 1 s ∑ k = 1 o ∂ L ∂ z l k ∂ z l k ∂ x i j \frac{\partial L}{\partial x_{ij}}=\sum_{l=1}^s\sum_{k=1}^o\frac{\partial L}{\partial z_{lk}}\frac{\partial z_{lk}}{\partial x_{ij}} ∂xij∂L=l=1∑sk=1∑o∂zlk∂L∂xij∂zlk
因为
z l k = ∑ l = 1 n x l t w t k + b k z_{lk}=\sum_{l=1}^n{x_{lt}w_{tk}+b_k} zlk=l=1∑nxltwtk+bk
所以 ∂ z l k ∂ x i j = { w j k , 如果 l = i 0 , 如果 l ≠ i \frac{\partial z_{lk}}{\partial x_{ij}}= \left\{ \begin{aligned} & w_{jk}, && \text{如果 } l=i \\ & 0, && \text{如果 } l \neq i \end{aligned} \right. ∂xij∂zlk={wjk,0,如果 l=i如果 l=i
所以 ∂ L ∂ x i j = ∑ k = 1 o ∂ L ∂ z i k w j k \frac{\partial L}{\partial x_{ij}}=\sum_{k=1}^o\frac{\partial L}{\partial z_{ik}}w_{jk} ∂xij∂L=k=1∑o∂zik∂Lwjk
而 ∂ L ∂ Z × W T = [ ∂ L ∂ z 11 … ∂ L ∂ z 1 o … ∂ L ∂ z s 1 … ∂ L ∂ z s o ] × [ w 11 … w n 1 … w 1 o … w n o ] \frac{\partial L}{\partial Z}\times W^T= \begin{bmatrix} \frac{\partial L}{\partial z_{11}} & … & \frac{\partial L}{\partial z_{1o}} \\ & … & \\ \frac{\partial L}{\partial z_{s1}} & … & \frac{\partial L}{\partial z_{so}} \end{bmatrix} \times \begin{bmatrix} w_{11} & … & w_{n1} \\ & … & \\ w_{1o} & … & w_{no} \end{bmatrix} ∂Z∂L×WT= ∂z11∂L∂zs1∂L………∂z1o∂L∂zso∂L × w11w1o………wn1wno
所以 ( ∂ L ∂ Z × W T ) i j = ∑ k = 1 o ∂ L ∂ z i k w j k = ∂ L ∂ x i j (\frac{\partial L}{\partial Z}\times W^T)_{ij}=\sum_{k=1}^o\frac{\partial L}{\partial z_{ik}}w_{jk}=\frac{\partial L}{\partial x_{ij}} (∂Z∂L×WT)ij=k=1∑o∂zik∂Lwjk=∂xij∂L
所以 ∂ L ∂ X = ∂ L ∂ Z × W T \frac{\partial L}{\partial X}=\frac{\partial L}{\partial Z}\times W^T ∂X∂L=∂Z∂L×WT
损失函数L对b的梯度:
∂ L ∂ b = s u m ( ∂ L ∂ Z , a x i s = 0 ) # 逐列求和 \frac{\partial L}{\partial b}=sum(\frac{\partial L}{\partial Z}, axis=0) \#逐列求和 ∂b∂L=sum(∂Z∂L,axis=0)#逐列求和
证明: \textbf{证明:} 证明:
因为
∂ L ∂ b k = ∑ i = 1 s ∑ j = 1 o ∂ L ∂ z i j ∂ z i j ∂ b k = ∑ i = 1 s ∂ L ∂ z i k \frac{\partial L}{\partial b_{k}} = \sum_{i=1}^s \sum_{j=1}^{o} \frac{\partial L}{\partial z_{ij}} \frac{\partial z_{ij}}{\partial b_{k}} =\sum_{i=1}^s\frac{\partial L}{\partial z_{ik}} ∂bk∂L=i=1∑sj=1∑o∂zij∂L∂bk∂zij=i=1∑s∂zik∂L
所以
∂ L ∂ b = 1 s ∂ L ∂ Z = s u m ( ∂ L ∂ Z , a x i s = 0 ) \frac{\partial L}{\partial b}=\mathbf{1}_s\frac{\partial L}{\partial Z}=sum(\frac{\partial L}{\partial Z}, axis=0) ∂b∂L=1s∂Z∂L=sum(∂Z∂L,axis=0)
其中 1 s \mathbf{1}_s 1s为s列行向量