文章目录
- 1:book: 舒尔补介绍
- 1-1:bookmark: 舒尔补定义
- 1-2:bookmark: 舒尔补的定理推导
- 1-3 :bookmark: 用途:快速求矩阵的逆
- 1-4:bookmark:用途:舒尔补在信息矩阵求解中的使用
- 1-5:bookmark:用途: 舒尔补应用于多元高斯分布
- 通过舒尔补分解多元高斯分布
- 边际概率和条件概率的信息矩阵
- 总结
- 参考资料
1📖 舒尔补介绍
1-1🔖 舒尔补定义
给定任意的矩阵块 M \mathbf{M} M , 如下所示:
M = [ A B C D ] \mathbf{M}=\left[\begin{array}{cc} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{array}\right] M=[ACBD]
- 如果,矩阵块 D \mathrm{D} D 是可逆的,则 A − B D − 1 C \mathrm{A}-\mathrm{BD}^{-1} \mathrm{C} A−BD−1C 称之为 D \mathrm{D} D 关于 M \mathrm{M} M 的舒尔补。
- 如果,矩阵块 A \mathbf{A} A 是可逆的,则 D − C A − 1 B \mathrm{D}-\mathrm{CA}^{-1} \mathrm{~B} D−CA−1 B 称之为 A \mathrm{A} A 关于 M \mathrm{M} M 的舒尔补。
1-2🔖 舒尔补的定理推导
将 M \mathrm{M} M 矩阵分别变成上三角或者下三角形:
[ I 0 − C A − 1 I ] [ A B C D ] = [ A B 0 Δ A ] [ A B C D ] [ I − A − 1 B 0 I ] = [ A 0 C Δ A ] \begin{array}{l} {\left[\begin{array}{cc} \mathbf{I} & \mathbf{0} \\ \mathbf{- C A}^{-1} & \mathbf{I} \end{array}\right]\left[\begin{array}{cc} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{array}\right]=\left[\begin{array}{cc} \mathbf{A} & \mathbf{B} \\ \mathbf{0} & \Delta_{\mathbf{A}} \end{array}\right]} \\ {\left[\begin{array}{cc} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{array}\right]\left[\begin{array}{cc} \mathbf{I} & -\mathbf{A}^{-1} \mathbf{B} \\ \mathbf{0} & \mathbf{I} \end{array}\right]=\left[\begin{array}{cc} \mathbf{A} & 0 \\ \mathbf{C} & \Delta_{\mathbf{A}} \end{array}\right]} \end{array} [I−CA−10I][ACBD]=[A0BΔA][ACBD][I0−A−1BI]=[AC0ΔA]
其中: Δ A = D − C A − 1 B \Delta_{\mathrm{A}}=\mathrm{D}-\mathbf{C A}^{-1} \mathbf{B} ΔA=D−CA−1B 。联合起来, 将 M \mathbf{M} M 变形成对角形:
[ I 0 − C A − 1 I ] [ A B C D ] [ I − A − 1 B 0 I ] = [ A 0 0 Δ A ] \left[\begin{array}{cc} \mathbf{I} & \mathbf{0} \\ -\mathbf{C A}^{-1} & \mathbf{I} \end{array}\right]\left[\begin{array}{cc} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{array}\right]\left[\begin{array}{cc} \mathbf{I} & -\mathbf{A}^{-\mathbf{1}} \mathbf{B} \\ \mathbf{0} & \mathbf{I} \end{array}\right]=\left[\begin{array}{cc} \mathbf{A} & \mathbf{0} \\ \mathbf{0} & \Delta_{\mathbf{A}} \end{array}\right] [I−CA−10I][ACBD][I0−A−1BI]=[A00ΔA]
反过来,我们又能从对角形恢复成矩阵 M \mathbf{M} M :
[ I 0 C A − 1 I ] [ A 0 0 Δ A ] [ I A − 1 B 0 I ] = [ A B C D ] \left[\begin{array}{cc} \mathbf{I} & \mathbf{0} \\ \mathbf{C A}^{-1} & \mathbf{I} \end{array}\right]\left[\begin{array}{cc} \mathbf{A} & \mathbf{0} \\ \mathbf{0} & \Delta_{\mathbf{A}} \end{array}\right]\left[\begin{array}{cc} \mathbf{I} & \mathbf{A}^{-\mathbf{1}} \mathbf{B} \\ \mathbf{0} & \mathbf{I} \end{array}\right]=\left[\begin{array}{cc} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{array}\right] [ICA−10I][A00ΔA][I0A−1BI]=[ACBD]
1-3 🔖 用途:快速求矩阵的逆
矩阵 M \mathrm{M} M 可写为:
M = [ A B C D ] = [ I 0 C A − 1 I ] [ A 0 0 Δ A ] [ I A − 1 B 0 I ] \mathrm{M} = \left[\begin{array}{ll} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{array}\right]=\left[\begin{array}{cc} \mathbf{I} & \mathbf{0} \\ \mathbf{C A}^{-1} & \mathbf{I} \end{array}\right]\left[\begin{array}{cc} \mathbf{A} & \mathbf{0} \\ \mathbf{0} & \Delta_{\mathbf{A}} \end{array}\right]\left[\begin{array}{cc} \mathbf{I} & \mathbf{A}^{-\mathbf{1}} \mathbf{B} \\ \mathbf{0} & \mathbf{I} \end{array}\right] M=[ACBD]=[ICA−10I][A00ΔA][I0A−1BI]
所以
M − 1 = [ A B C D ] − 1 = [ I − A − 1 B 0 I ] [ A − 1 0 0 Δ A − 1 ] [ I 0 − C A − 1 I ] \mathrm{M}^{-1}= \left[\begin{array}{cc} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{array}\right]^{-1}=\left[\begin{array}{cc} \mathbf{I} & -\mathbf{A}^{-1} \mathbf{B} \\ \mathbf{0} & \mathbf{I} \end{array}\right]\left[\begin{array}{cc} \mathbf{A}^{-1} & \mathbf{0} \\ \mathbf{0} & \Delta_{\mathbf{A}}^{-1} \end{array}\right]\left[\begin{array}{cc} \mathbf{I} & \mathbf{0} \\ -\mathbf{C A}^{-1} & \mathbf{I} \end{array}\right] M−1=[ACBD]−1=[I0−A−1BI][A−100ΔA−1][I−CA−10I]
依据提示:
[ I − A − 1 B 0 I ] [ I A − 1 B 0 I ] = I \left[\begin{array}{cc} \mathbf{I} & -\mathbf{A}^{-\mathbf{1}} \mathbf{B} \\ \mathbf{0} & \mathbf{I} \end{array}\right]\left[\begin{array}{cc} \mathbf{I} & \mathbf{A}^{-1} \mathbf{B} \\ \mathbf{0} & \mathbf{I} \end{array}\right]=\mathbf{I} [I0−A−1BI][I0A−1BI]=I
最终结果:
M − 1 = [ A − 1 + A − 1 B Δ A − 1 C A − 1 − A − 1 B Δ A − 1 − Δ A − 1 C A − 1 Δ A − 1 ] \mathrm{M}^{-1}=\left[\begin{array}{cc} A^{-1}+A^{-1} B \Delta_{\mathrm{A}}^{-1} C A^{-1} & -A^{-1} B \Delta_{\mathrm{A}}^{-1} \\ -\Delta_{\mathrm{A}}^{-1} C A^{-1} & \Delta_{\mathrm{A}}^{-1} \end{array}\right] M−1=[A−1+A−1BΔA−1CA−1−ΔA−1CA−1−A−1BΔA−1ΔA−1]
1-4🔖用途:舒尔补在信息矩阵求解中的使用
假设我们已知信息矩阵:
另外,根据舒尔补公式可知,协方差矩阵各块和信息矩阵之间的关系有:
协方差矩阵:
Σ = [ A C ⊤ C D ] \mathbf{\Sigma} =\left[\begin{array}{ll} A & C^{\top} \\ C & D \end{array}\right] Σ=[ACC⊤D]
对应的信息矩阵
Σ − 1 = [ A C ⊤ C D ] − 1 = [ A − 1 + A − 1 C ⊤ Δ A − 1 C A − 1 − A − 1 C ⊤ Δ A − 1 − Δ A − 1 C A − 1 Δ A − 1 ] ≜ [ Λ a a Λ a b Λ b a Λ b b ] \mathbf{\Sigma}^{-1} = \left[\begin{array}{cc} A & C^{\top} \\ C & D \end{array}\right]^{-1}=\left[\begin{array}{cc} A^{-1}+A^{-1} C^{\top} \Delta_{\mathrm{A}}^{-1} C A^{-1} & -A^{-1} C^{\top} \Delta_{\mathrm{A}}^{-1} \\ -\Delta_{\mathrm{A}}^{-1} C A^{-1} & \Delta_{\mathrm{A}}^{-1} \end{array}\right] \triangleq\left[\begin{array}{cc} \Lambda_{a a} & \Lambda_{a b} \\ \Lambda_{b a} & \Lambda_{b b} \end{array}\right] Σ−1=[ACC⊤D]−1=[A−1+A−1C⊤ΔA−1CA−1−ΔA−1CA−1−A−1C⊤ΔA−1ΔA−1]≜[ΛaaΛbaΛabΛbb]
注意:
中间那一步是利用舒尔补求逆的过程,这里直接使用了上一小结的结论带入,,具体过程参考上一小结。
其中:
Δ A = D − C A − 1 C ⊤ \Delta_{\mathrm{A}}=\mathrm{D}-\mathbf{C A}^{-1} \mathbf{C}^{\top} ΔA=D−CA−1C⊤
根据对应关系,不难得出:
Δ A − 1 = Λ b b A − 1 = Λ a a − Λ a b Λ b b − 1 Λ b a \Delta_{A}^{-1}=\Lambda_{b b} \\ A^{-1}=\Lambda_{a a}-\Lambda_{a b} \Lambda_{b b}^{-1} \Lambda_{b a} ΔA−1=ΛbbA−1=Λaa−ΛabΛbb−1Λba
或者:
D − 1 = Λ b b − Λ b a Λ a a − 1 Λ a b D^{-1}=\Lambda_{b b}-\Lambda_{b a} \Lambda_{a a}^{-1} \Lambda_{a b} D−1=Λbb−ΛbaΛaa−1Λab
这里的 A − 1 A^{-1} A−1或者 D − 1 D^{-1} D−1就是在下一次优化会使用的先验信息矩阵(又名:边际概率的信息矩阵)。
其他
关于边际概率和条件概率的使用,有兴趣的可以参考下一小节(先给出下一小节的结论)
P ( a , b ) = N ( [ μ a μ b ] , [ Σ a a Σ a b Σ b a Σ b b ] ) = N − 1 ( [ η a η b ] , [ Λ a a Λ a a Λ b a Λ b b ] ) P(\boldsymbol{a}, \boldsymbol{b})=\mathcal{N}\left(\left[\begin{array}{l} \boldsymbol{\mu}_{a} \\ \boldsymbol{\mu}_{b} \end{array}\right],\left[\begin{array}{cc} \boldsymbol{\Sigma}_{a a} & \boldsymbol{\Sigma}_{a b} \\ \boldsymbol{\Sigma}_{b a} & \boldsymbol{\Sigma}_{b b} \end{array}\right]\right)=\mathcal{N}^{-1}\left(\left[\begin{array}{l} \eta_{a} \\ \eta_{b} \end{array}\right],\left[\begin{array}{cc} \boldsymbol{\Lambda}_{a a} & \boldsymbol{\Lambda}_{a a} \\ \boldsymbol{\Lambda}_{b a} & \boldsymbol{\Lambda}_{b b} \end{array}\right]\right) P(a,b)=N([μaμb],[ΣaaΣbaΣabΣbb])=N−1([ηaηb],[ΛaaΛbaΛaaΛbb])
以及
边际概率 条件概率 p ( a ) = ∫ p ( a , b ) d b p ( a ∣ b ) = p ( a , b ) / p ( b ) 协方差矩阵 μ = μ a μ ′ = μ a + Σ a b Σ b b − 1 ( b − μ b ) Σ = Σ a a Σ ′ = Σ a a − Σ a b Σ b b − 1 Σ b a 信息矩阵 η = η a − Λ a β Λ b b − 1 η b Λ = Λ a a − Λ a b Λ b b − 1 Λ b a η ′ = η a − Λ a b b Λ ′ = Λ a a \begin{array}{|c|c|c|} \hline & \text { 边际概率 } & \text { 条件概率 } \\ & p(\boldsymbol{a})=\int p(\boldsymbol{a}, \boldsymbol{b}) d \boldsymbol{b} & p(\boldsymbol{a} \mid \boldsymbol{b})=p(\boldsymbol{a}, \boldsymbol{b}) / p(\boldsymbol{b}) \\ \hline {\text { 协方差矩阵 }} & \boldsymbol{\mu}=\boldsymbol{\mu}_{a} & \boldsymbol{\mu}^{\prime}=\boldsymbol{\mu}_{a}+\Sigma_{a b} \Sigma_{b b}^{-1}\left(\boldsymbol{b}-\boldsymbol{\mu}_{b}\right) \\ & \Sigma=\Sigma_{a a} & \Sigma^{\prime}=\Sigma_{a a}-\Sigma_{a b} \Sigma_{b b}^{-1} \Sigma_{b a} \\ \hline \text { 信息矩阵 } & \begin{array}{c} \boldsymbol{\eta}=\boldsymbol{\eta}_{a}-\Lambda_{a \beta} \Lambda_{b b}^{-1} \boldsymbol{\eta}_{b} \\ \Lambda=\Lambda_{a a}-\Lambda_{a b} \Lambda_{b b}{ }^{-1} \Lambda_{b a} \end{array} & \begin{array}{c} \boldsymbol{\eta}^{\prime}=\boldsymbol{\eta}_{a}-\Lambda_{a b} \boldsymbol{b} \\ {\Lambda^{\prime}=\Lambda_{a a}} \end{array} \\ \hline \end{array} 协方差矩阵 信息矩阵 边际概率 p(a)=∫p(a,b)dbμ=μaΣ=Σaaη=ηa−ΛaβΛbb−1ηbΛ=Λaa−ΛabΛbb−1Λba 条件概率 p(a∣b)=p(a,b)/p(b)μ′=μa+ΣabΣbb−1(b−μb)Σ′=Σaa−ΣabΣbb−1Σbaη′=ηa−ΛabbΛ′=Λaa
1-5🔖用途: 舒尔补应用于多元高斯分布
通过舒尔补分解多元高斯分布
假设多元变量 x \mathrm{x} x 服从高斯分布,且由两部分组成: x = [ a b ] \mathbf{x}=\left[\begin{array}{c}a \\ b\end{array}\right] x=[ab], 变量之 间构成的协方差矩阵为:
K = [ A C ⊤ C D ] \mathbf{K}=\left[\begin{array}{cc} A & C^{\top} \\ C & D \end{array}\right] K=[ACC⊤D]
其中 A = cov ( a , a ) , D = cov ( b , b ) , C = cov ( a , b ) A=\operatorname{cov}(a, a), D=\operatorname{cov}(b, b), C=\operatorname{cov}(a, b) A=cov(a,a),D=cov(b,b),C=cov(a,b). 由此变量 x \mathrm{x} x 的概率分布为:
P ( a , b ) = P ( a ) P ( b ∣ a ) ∝ exp ( − 1 2 [ a b ] ⊤ [ A C ⊤ C D ] − 1 [ a b ] ) P(a, b)=P(a) P(b \mid a) \propto \exp \left(-\frac{1}{2}\left[\begin{array}{l} a \\ b \end{array}\right]^{\top}\left[\begin{array}{cc} A & C^{\top} \\ C & D \end{array}\right]^{-1}\left[\begin{array}{l} a \\ b \end{array}\right]\right) P(a,b)=P(a)P(b∣a)∝exp(−21[ab]⊤[ACC⊤D]−1[ab])
利用舒尔补一节公式, 对高斯分布进行分解,得
P ( a , b ) ∝ exp ( − 1 2 [ a b ] ⊤ [ A C ⊤ C D ] − 1 [ a b ] ) ∝ exp ( − 1 2 [ a b ] ⊤ [ I − A − 1 C ⊤ 0 I ] [ A − 1 0 0 Δ A − 1 ] [ I 0 − C A − 1 I ] [ a b ] ) ∝ exp ( − 1 2 [ a ⊤ ( b − C A − 1 a ) ⊤ ] [ A − 1 0 0 Δ A − 1 ] [ a b − C A − 1 a ] ) ∝ exp ( − 1 2 ( a ⊤ A − 1 a ) + ( b − C A − 1 a ) ⊤ Δ A − 1 ( b − C A − 1 a ) ) ∝ exp ( − 1 2 a ⊤ A − 1 a ) ⏟ p ( a ) exp ( − 1 2 ( b − C A − 1 a ) ⊤ Δ A − 1 ( b − C A − 1 a ) ) ⏟ p ( b ] a ) \begin{array}{l} P(a, b) \\ \propto \exp \left(-\frac{1}{2}\left[\begin{array}{l} a \\ b \end{array}\right]^{\top}\left[\begin{array}{cc} A & C^{\top} \\ C & D \end{array}\right]^{-1}\left[\begin{array}{l} a \\ b \end{array}\right]\right) \\ \propto \exp \left(-\frac{1}{2}\left[\begin{array}{l} a \\ b \end{array}\right]^{\top}\left[\begin{array}{cc} I & -A^{-1} C^{\top} \\ 0 & I \end{array}\right]\left[\begin{array}{cc} A^{-1} & 0 \\ 0 & \Delta_{\mathrm{A}}^{-1} \end{array}\right]\left[\begin{array}{cc} I & 0 \\ -C A^{-1} & I \end{array}\right]\left[\begin{array}{l} a \\ b \end{array}\right]\right) \\ \propto \exp \left(-\frac{1}{2}\left[a^{\top} \quad\left(b-C A^{-1} a\right)^{\top}\right]\left[\begin{array}{cc} A^{-1} & 0 \\ 0 & \Delta_{\mathbf{A}}^{-1} \end{array}\right]\left[\begin{array}{c} a \\ b-C A^{-1} a \end{array}\right]\right) \\ \propto \exp \left(-\frac{1}{2}\left(a^{\top} A^{-1} a\right)+\left(b-C A^{-1} a\right)^{\top} \Delta_{\mathbf{A}}^{-1}\left(b-C A^{-1} a\right)\right) \\ \propto \underbrace{\exp \left(-\frac{1}{2} a^{\top} A^{-1} a\right)}_{p(a)} \underbrace{\exp \left(-\frac{1}{2}\left(b-C A^{-1} a\right)^{\top} \Delta_{\mathbf{A}}^{-1}\left(b-C A^{-1} a\right)\right)}_{p(b] a)} \end{array} P(a,b)∝exp(−21[ab]⊤[ACC⊤D]−1[ab])∝exp(−21[ab]⊤[I0−A−1C⊤I][A−100ΔA−1][I−CA−10I][ab])∝exp(−21[a⊤(b−CA−1a)⊤][A−100ΔA−1][ab−CA−1a])∝exp(−21(a⊤A−1a)+(b−CA−1a)⊤ΔA−1(b−CA−1a))∝p(a) exp(−21a⊤A−1a)p(b]a) exp(−21(b−CA−1a)⊤ΔA−1(b−CA−1a))
这意味着我们能从多元高斯分布 P ( a , b ) \mathrm{P}(\mathrm{a}, \mathrm{b}) P(a,b) 中分解得到边际概率 p ( a ) \mathrm{p}(\mathrm{a}) p(a) 和 条件概率 p ( b ∣ a ) \mathrm{p}(\mathrm{b} | \mathrm{a}) p(b∣a) 。
边际概率和条件概率的信息矩阵
假设我们已知信息矩阵:
[ A C ⊤ C D ] − 1 = [ Λ a a Λ a b Λ b a Λ b b ] \left[\begin{array}{cc} A & C^{\top} \\ C & D \end{array}\right]^{-1}=\left[\begin{array}{cc} \Lambda_{a a} & \Lambda_{a b} \\ \Lambda_{b a} & \Lambda_{b b} \end{array}\right] [ACC⊤D]−1=[ΛaaΛbaΛabΛbb]
另外,由舒尔补矩阵求逆公式可知,协方差矩阵各块和信息矩阵之间有:
[ A C ⊤ C D ] − 1 = [ A − 1 + A − 1 C ⊤ Δ A − 1 C A − 1 − A − 1 C ⊤ Δ A − 1 − Δ A − 1 C A − 1 Δ A − 1 ] ≜ [ Λ a a Λ a b Λ b a Λ b b ] \left[\begin{array}{cc} A & C^{\top} \\ C & D \end{array}\right]^{-1}=\left[\begin{array}{cc} A^{-1}+A^{-1} C^{\top} \Delta_{\mathrm{A}}^{-1} C A^{-1} & -A^{-1} C^{\top} \Delta_{\mathrm{A}}^{-1} \\ -\Delta_{\mathrm{A}}^{-1} C A^{-1} & \Delta_{\mathrm{A}}^{-1} \end{array}\right] \triangleq\left[\begin{array}{cc} \Lambda_{a a} & \Lambda_{a b} \\ \Lambda_{b a} & \Lambda_{b b} \end{array}\right] [ACC⊤D]−1=[A−1+A−1C⊤ΔA−1CA−1−ΔA−1CA−1−A−1C⊤ΔA−1ΔA−1]≜[ΛaaΛbaΛabΛbb]
由条件概率 P ( b ∣ a ) P(b \mid a) P(b∣a) 的协方差为 Δ A \Delta_{A} ΔA 以及公式, 易得其信息矩阵为
Δ A − 1 = Λ b b \Delta_{A}^{-1}=\Lambda_{b b} ΔA−1=Λbb
由边际概率 P ( a ) P(a) P(a) 的协方差为 A A A 以及公式 , 易得其信息矩阵为:
A − 1 = Λ a a − Λ a b Λ b b − 1 Λ b a A^{-1}=\Lambda_{a a}-\Lambda_{a b} \Lambda_{b b}^{-1} \Lambda_{b a} A−1=Λaa−ΛabΛbb−1Λba
总结
关于 P ( a ) \bf{P(a)} P(a)
P ( a ) = ∫ b P ( a , b ) P ( a ) ∝ exp ( − 1 2 a ⊤ A − 1 a ) ∼ N ( 0 , A ) \begin{array}{l} P(a)=\int_{b} P(a, b) \\ P(a) \propto \exp \left(-\frac{1}{2} a^{\top} A^{-1} a\right) \sim \mathcal{N}(0, A) \end{array} P(a)=∫bP(a,b)P(a)∝exp(−21a⊤A−1a)∼N(0,A)
启示:边际概率的协方差就是从联合分布中取对应的矩阵块就行了。
关于 P ( b ∣ a ) \bf{P(b | a)} P(b∣a)
P ( b ∣ a ) ∝ exp ( − 1 2 ( b − C A − 1 a ) ⊤ Δ A − 1 ( b − C A − 1 a ) ) P(b | a) \propto \exp \left(-\frac{1}{2}\left(b-C A^{-1} a\right)^{\top} \Delta_{\mathbf{A}}^{-1}\left(b-C A^{-1} a\right)\right) P(b∣a)∝exp(−21(b−CA−1a)⊤ΔA−1(b−CA−1a))
启示: P ( b ∣ a ) ∼ N ( C A − 1 a , Δ A ) P(b | a) \sim \mathcal{N}\left(C A^{-1} a, \Delta_{A}\right) P(b∣a)∼N(CA−1a,ΔA) 。协方差变为 a a a 对应的舒尔补, 均值也变了。
最后
P ( a , b ) = N ( [ μ a μ b ] , [ Σ a a Σ a b Σ b a Σ b b ] ) = N − 1 ( [ η a η b ] , [ Λ a a Λ a a Λ b a Λ b b ] ) P(\boldsymbol{a}, \boldsymbol{b})=\mathcal{N}\left(\left[\begin{array}{l} \boldsymbol{\mu}_{a} \\ \boldsymbol{\mu}_{b} \end{array}\right],\left[\begin{array}{cc} \boldsymbol{\Sigma}_{a a} & \boldsymbol{\Sigma}_{a b} \\ \boldsymbol{\Sigma}_{b a} & \boldsymbol{\Sigma}_{b b} \end{array}\right]\right)=\mathcal{N}^{-1}\left(\left[\begin{array}{l} \eta_{a} \\ \eta_{b} \end{array}\right],\left[\begin{array}{cc} \boldsymbol{\Lambda}_{a a} & \boldsymbol{\Lambda}_{a a} \\ \boldsymbol{\Lambda}_{b a} & \boldsymbol{\Lambda}_{b b} \end{array}\right]\right) P(a,b)=N([μaμb],[ΣaaΣbaΣabΣbb])=N−1([ηaηb],[ΛaaΛbaΛaaΛbb])
以及
边际概率 条件概率 p ( a ) = ∫ p ( a , b ) d b p ( a ∣ b ) = p ( a , b ) / p ( b ) 协方差矩阵 μ = μ a μ ′ = μ a + Σ a b Σ b b − 1 ( b − μ b ) Σ = Σ a a Σ ′ = Σ a a − Σ a b Σ b b − 1 Σ b a 信息矩阵 η = η a − Λ a β Λ b b − 1 η b Λ = Λ a a − Λ a b Λ b b − 1 Λ b a η ′ = η a − Λ a b b Λ ′ = Λ a a \begin{array}{|c|c|c|} \hline & \text { 边际概率 } & \text { 条件概率 } \\ & p(\boldsymbol{a})=\int p(\boldsymbol{a}, \boldsymbol{b}) d \boldsymbol{b} & p(\boldsymbol{a} \mid \boldsymbol{b})=p(\boldsymbol{a}, \boldsymbol{b}) / p(\boldsymbol{b}) \\ \hline {\text { 协方差矩阵 }} & \boldsymbol{\mu}=\boldsymbol{\mu}_{a} & \boldsymbol{\mu}^{\prime}=\boldsymbol{\mu}_{a}+\Sigma_{a b} \Sigma_{b b}^{-1}\left(\boldsymbol{b}-\boldsymbol{\mu}_{b}\right) \\ & \Sigma=\Sigma_{a a} & \Sigma^{\prime}=\Sigma_{a a}-\Sigma_{a b} \Sigma_{b b}^{-1} \Sigma_{b a} \\ \hline \text { 信息矩阵 } & \begin{array}{c} \boldsymbol{\eta}=\boldsymbol{\eta}_{a}-\Lambda_{a \beta} \Lambda_{b b}^{-1} \boldsymbol{\eta}_{b} \\ \Lambda=\Lambda_{a a}-\Lambda_{a b} \Lambda_{b b}{ }^{-1} \Lambda_{b a} \end{array} & \begin{array}{c} \boldsymbol{\eta}^{\prime}=\boldsymbol{\eta}_{a}-\Lambda_{a b} \boldsymbol{b} \\ {\Lambda^{\prime}=\Lambda_{a a}} \end{array} \\ \hline \end{array} 协方差矩阵 信息矩阵 边际概率 p(a)=∫p(a,b)dbμ=μaΣ=Σaaη=ηa−ΛaβΛbb−1ηbΛ=Λaa−ΛabΛbb−1Λba 条件概率 p(a∣b)=p(a,b)/p(b)μ′=μa+ΣabΣbb−1(b−μb)Σ′=Σaa−ΣabΣbb−1Σbaη′=ηa−ΛabbΛ′=Λaa
参考资料
深蓝学院手写vio课程