标量对向量求导的常用数学公式
设标量函数 y = f ( x ) y = f(\boldsymbol{x}) y=f(x),其中 x = ( x 1 , x 2 , ⋯ , x n ) T \boldsymbol{x} = (x_1, x_2, \cdots, x_n)^{\rm T} x=(x1,x2,⋯,xn)T是一个 n n n维列向量。标量 y y y对向量 x \boldsymbol{x} x的导数为一个 n n n维列向量:
∂ y ∂ x = [ ∂ y ∂ x 1 ∂ y ∂ x 2 ⋮ ∂ y ∂ x n ] \frac{\partial y}{\partial \boldsymbol{x}} = \begin{bmatrix} \dfrac{\partial y}{\partial x_1} \\ \dfrac{\partial y}{\partial x_2} \\ \vdots \\ \dfrac{\partial y}{\partial x_n} \end{bmatrix} ∂x∂y= ∂x1∂y∂x2∂y⋮∂xn∂y
- 线性函数:若 y = a T x y = \boldsymbol{a}^{\rm T} \boldsymbol{x} y=aTx,其中 a \boldsymbol{a} a是一个 n n n维列向量,则
∂ y ∂ x = a \frac{\partial y}{\partial \boldsymbol{x}} = \boldsymbol{a} ∂x∂y=a
- 二次型函数:若 y = x T A x y = \boldsymbol{x}^{\rm T} {\bm A} \boldsymbol{x} y=xTAx,其中 A {\bm A} A是一个 n × n n \times n n×n的矩阵,则
∂ y ∂ x = ( A + A T ) x \frac{\partial y}{\partial \boldsymbol{x}} = ({\bm A} + {\bm A}^{\rm T}) \boldsymbol{x} ∂x∂y=(A+AT)x
当 A {\bm A} A为对称矩阵时, A T = A {\bm A}^{\rm T} = {\bm A} AT=A,则
∂ y ∂ x = 2 A x \frac{\partial y}{\partial \boldsymbol{x}} = 2{\bm A} \boldsymbol{x} ∂x∂y=2Ax
当 A {\bm A} A为单位矩阵时, y = x T x y = \boldsymbol{x}^{\rm T} \boldsymbol{x} y=xTx,则
∂ y ∂ x = ∂ ∥ x ∥ 2 ∂ x = ∂ x T x ∂ x = 2 x \frac{\partial y}{\partial \boldsymbol{x}} = \frac{\partial \|{\bm x}\|^2}{\partial {\bm x}} = \frac{\partial {\bm x}^{\rm T} {\bm x}}{\partial {\bm x}} =2{\bm x} ∂x∂y=∂x∂∥x∥2=∂x∂xTx=2x
∥ x ∥ 2 \|{\bm x}\|^2 ∥x∥2表示向量 x {\bm x} x的范数(长度)的平方。
向量对向量求导的常用数学公式
若 y = A x {\bm y}= {\bm A} \boldsymbol{x} y=Ax,其中 A {\bm A} A是一个 n × n n \times n n×n的矩阵,则
∂ y ∂ x = ∂ A x ∂ x = A T \frac{\partial {\bm y}}{\partial \boldsymbol{x}} = \frac{\partial {\bm A}{\bm x}}{\partial {\bm x}} = {\bm A}^{\rm T} ∂x∂y=∂x∂Ax=AT
A {\bm A} A是一个矩阵, x {\bm x} x是一个向量。
对 x {\bm x} x求导的结果是矩阵 A {\bm A} A的转置 A T {\bm A}^{\rm T} AT。
复合函数的导数
给定函数 g ( u ( x ) ) g(u(x)) g(u(x)),其中 u = u ( x ) = b − A x {\bm u}=u({\bm x}) = {\bm b} - {\bm A}{\bm x} u=u(x)=b−Ax,且 g ( u ) = ∥ u ∥ 2 g({\bm u}) = \|{\bm u}\|^2 g(u)=∥u∥2。
链式法则
根据链式法则(Chain Rule),有:
∂ g ( u ( x ) ) ∂ x = ∂ g ∂ u ⋅ ∂ u ∂ x \frac{\partial g(u({\bm x}))}{\partial {\bm x}} = \frac{\partial g}{\partial {\bm u}} \cdot \frac{\partial {\bm u}}{\partial {\bm x}} ∂x∂g(u(x))=∂u∂g⋅∂x∂u
具体步骤
-
计算 ∂ u ∂ x \dfrac{\partial {\bm u}}{\partial {\bm x}} ∂x∂u:
u ( x ) = b − A x {\bm u}({\bm x}) = {\bm b} - {\bm A}{\bm x} u(x)=b−Ax
对 x {\bm x} x求导得到:
∂ u ∂ x = − A \frac{\partial {\bm u}}{\partial {\bm x}} = -{\bm A} ∂x∂u=−A -
计算 ∂ g ( u ) ∂ u \dfrac{\partial g({\bm u})}{\partial {\bm u}} ∂u∂g(u):
g ( u ) = ∥ u ∥ 2 = u T u g({\bm u}) = \|{\bm u}\|^2 = {\bm u}^{\rm T} {\bm u} g(u)=∥u∥2=uTu
对 u {\bm u} u求导得到:
∂ g ( u ) ∂ u = 2 u \frac{\partial g({\bm u})}{\partial {\bm u}} = 2{\bm u} ∂u∂g(u)=2u -
应用链式法则:
∂ g ( u ( x ) ) ∂ x = ∂ g ( u ) ∂ u ⋅ ∂ u ∂ x \frac{\partial g(u({\bm x}))}{\partial {\bm x}} = \frac{\partial g({\bm u})}{\partial {\bm u}} \cdot \frac{\partial {\bm u}}{\partial {\bm x}} ∂x∂g(u(x))=∂u∂g(u)⋅∂x∂u
将上面的结果代入:
∂ g ( u ( x ) ) ∂ x = 2 u ⋅ ( − A ) \frac{\partial g({\bm u}({\bm x}))}{\partial {\bm x}} = 2{\bm u} \cdot (-{\bm A}) ∂x∂g(u(x))=2u⋅(−A)
由于 u = b − A x {\bm u} = {\bm b} - {\bm A}{\bm x} u=b−Ax,代入得到:
∂ g ( u ( x ) ) ∂ x = − 2 A T ( b − A x ) \frac{\partial g({u}({\bm x}))}{\partial {\bm x}} = -2{\bm A}^{\rm T} ({\bm b} - {\bm A}{\bm x}) ∂x∂g(u(x))=−2AT(b−Ax)
最终结果是:
∂ ∥ b − A x ∥ 2 ∂ x = − 2 A T ( b − A x ) \frac{\partial \|{\bm b} - {\bm A}{\bm x}\|^2}{\partial {\bm x}} = -2{\bm A}^{\rm T} ({\bm b} - {\bm A}{\bm x}) ∂x∂∥b−Ax∥2=−2AT(b−Ax)