参考资料:
- 《统计学习方法》李航
- https://www.zhihu.com/question/33959624
Step1. 似然函数
在朴素贝叶斯模型中,我们需要通过训练集确定的参数为 θ k = P ( y = c k ) \theta_k=P(y=c_k) θk=P(y=ck) 和 μ j l k = P ( x ( j ) = a j l ∣ y = c k ) \mu_{jlk}=P(x^{(j)}=a_{jl}|y=c_k) μjlk=P(x(j)=ajl∣y=ck)
似然函数:
L ( θ , μ ) = ∏ i = 1 N P ( x i , y i ) = ∏ i = 1 N P ( y i ) P ( x i ∣ y i ) (乘法公式) = ∏ i = 1 N ( P ( y i ) ∏ j = 1 n P ( x i ( j ) ∣ y i ) ) (条件独立假设) = ∏ i = 1 N ∏ k = 1 K ( P ( y = c k ) ∏ j = 1 n P ( x i ( j ) ∣ y i = c k ) ) I ( y i = c k ) = ∏ i = 1 N ∏ k = 1 K ( θ k ∏ j = 1 n ∏ l = 1 L j P I ( x i ( j ) = a j l ) ( x ( j ) = a j l ∣ y i = c k ) ) I ( y i = c k ) = ∏ i = 1 N ∏ k = 1 K ( θ k ∏ j = 1 n ∏ l = 1 L j μ j l k I ( x i ( j ) = a j l ) ) I ( y i = c k ) \begin{align} L(\theta,\mu)&=\prod\limits_{i=1}^{N}P(x_i,y_i)\\ &=\prod\limits_{i=1}^{N}P(y_i)P(x_i|y_i)(乘法公式)\\ &=\prod\limits_{i=1}^{N}\Big(P(y_i)\prod\limits_{j=1}^{n}P(x^{(j)}_i|y_i)\Big)(条件独立假设)\\ &=\prod\limits_{i=1}^{N}\prod\limits_{k=1}^{K}\Big(P(y=c_k)\prod\limits_{j=1}^{n}P(x^{(j)}_i|y_i=c_k)\Big)^{I(y_i=c_k)}\\ &=\prod\limits_{i=1}^{N}\prod\limits_{k=1}^{K}\Big(\theta_k\prod\limits_{j=1}^{n}\prod\limits_{l=1}^{L_j}P^{I(x^{(j)}_i=a_{jl})}(x^{(j)}=a_{jl}|y_i=c_k)\Big)^{I(y_i=c_k)}\\ &=\prod\limits_{i=1}^{N}\prod\limits_{k=1}^{K}\Big(\theta_k\prod\limits_{j=1}^{n}\prod\limits_{l=1}^{L_j}\mu_{jlk}^{I(x^{(j)}_i=a_{jl})}\Big)^{I(y_i=c_k)}\\ \end{align} L(θ,μ)=i=1∏NP(xi,yi)=i=1∏NP(yi)P(xi∣yi)(乘法公式)=i=1∏N(P(yi)j=1∏nP(xi(j)∣yi))(条件独立假设)=i=1∏Nk=1∏K(P(y=ck)j=1∏nP(xi(j)∣yi=ck))I(yi=ck)=i=1∏Nk=1∏K(θkj=1∏nl=1∏LjPI(xi(j)=ajl)(x(j)=ajl∣yi=ck))I(yi=ck)=i=1∏Nk=1∏K(θkj=1∏nl=1∏LjμjlkI(xi(j)=ajl))I(yi=ck)
其中, N N N 为样本数, n n n 为 X X X 的维数, L j L_j Lj为 X ( j ) X^{(j)} X(j) 可能的取值数量, K K K 为 Y Y Y 可能的取值数量。
取对数:
l ( θ , μ ) = ∑ i = 1 N ∑ k = 1 K I ( y i = c k ) ( log θ k + ∑ j = 1 n ∑ l = 1 L j I ( x i ( j ) = a j l ) log μ j l k ) \begin{align} l(\theta,\mu)&=\sum\limits_{i=1}^{N}\sum\limits_{k=1}^{K}I(y_i=c_k)\Big(\log\theta_k+\sum\limits_{j=1}^{n}\sum\limits_{l=1}^{L_j}I(x^{(j)}_i=a_{jl})\log\mu_{jlk}\Big) \end{align} l(θ,μ)=i=1∑Nk=1∑KI(yi=ck)(logθk+j=1∑nl=1∑LjI(xi(j)=ajl)logμjlk)
Step2. 求 θ k \theta_k θk
利用拉格朗日乘数法引入约束条件 ∑ k = 1 K θ k = 1 \sum\limits_{k=1}^{K}\theta_k=1 k=1∑Kθk=1,得:
F ( θ , μ , λ ) = ∑ i = 1 N ∑ k = 1 K I ( y i = c k ) ( log θ k + ∑ j = 1 n ∑ l = 1 L j I ( x i ( j ) = a j l ) log μ j l k ) + λ ( ∑ k = 1 K θ k − 1 ) \begin{align} F(\theta,\mu,\lambda)=\sum\limits_{i=1}^{N}\sum\limits_{k=1}^{K}I(y_i=c_k)(\log\theta_k+\sum\limits_{j=1}^{n}\sum\limits_{l=1}^{L_j}I(x^{(j)}_i=a_{jl})\log\mu_{jlk})+\lambda(\sum\limits_{k=1}^{K}\theta_k-1) \end{align} F(θ,μ,λ)=i=1∑Nk=1∑KI(yi=ck)(logθk+j=1∑nl=1∑LjI(xi(j)=ajl)logμjlk)+λ(k=1∑Kθk−1)
对 F F F 求偏导并令偏导数为 0 0 0 ,得:
θ k = − ∑ i = 1 N I ( y i = c k ) λ ∑ k = 1 K θ k = − N λ = 1 \begin{align} \theta_k&=-\frac{\sum\limits_{i=1}^{N}I(y_i=c_k)}{\lambda}\\ \sum\limits_{k=1}^{K}\theta_k&=-\frac{N}{\lambda}=1 \end{align} θkk=1∑Kθk=−λi=1∑NI(yi=ck)=−λN=1
其中, N k N_k Nk 为样本中 Y = c k Y=c_k Y=ck 的数量。联立上面的两个方程,得:
θ k = ∑ i = 1 N I ( y i = c k ) N \begin{align} \theta_k=\frac{\sum\limits_{i=1}^{N}I(y_i=c_k)}{N} \end{align} θk=Ni=1∑NI(yi=ck)
Step3. 求 μ l k \mu_{lk} μlk
利用拉格朗日乘数法引入约束条件 ∑ l = 1 L j μ l k = 1 \sum\limits_{l=1}^{L_j}\mu_{lk}=1 l=1∑Ljμlk=1,得:
F ( θ , μ , λ ) = ∑ i = 1 N ∑ k = 1 K I ( y i = c k ) ( log θ k + ∑ j = 1 n ∑ l = 1 L j I ( x i ( j ) = a j l ) log μ j l k ) + λ ( ∑ l = 1 L j μ l k − 1 ) \begin{align} F(\theta,\mu,\lambda)=\sum\limits_{i=1}^{N}\sum\limits_{k=1}^{K}I(y_i=c_k)\Big(\log\theta_k+\sum\limits_{j=1}^{n}\sum\limits_{l=1}^{L_j}I(x^{(j)}_i=a_{jl})\log\mu_{jlk})+\lambda(\sum\limits_{l=1}^{L_j}\mu_{lk}-1\Big) \end{align} F(θ,μ,λ)=i=1∑Nk=1∑KI(yi=ck)(logθk+j=1∑nl=1∑LjI(xi(j)=ajl)logμjlk)+λ(l=1∑Ljμlk−1)
对 F F F 求偏导并令偏导数为 0 0 0 ,得:
μ j l k = − ∑ i = 1 N I ( y i = c k , x i ( j ) = a j l ) λ ∑ l = 1 L j μ l k = − ∑ i = 1 N I ( y i = c k ) λ = 1 \begin{align} \mu_{jlk}&=-\frac{\sum\limits_{i=1}^{N}I(y_i=c_k,x^{(j)}_i=a_{jl})}{\lambda}\\ \sum\limits_{l=1}^{L_j}\mu_{lk}&=-\frac{\sum\limits_{i=1}^{N}I(y_i=c_k)}{\lambda}=1 \end{align} μjlkl=1∑Ljμlk=−λi=1∑NI(yi=ck,xi(j)=ajl)=−λi=1∑NI(yi=ck)=1
联立上面两个方程,得:
μ j l k = ∑ i = 1 N I ( y i = c k , x i ( j ) = a j l ) ∑ i = 1 N I ( y i = c k ) \begin{align} \mu_{jlk}=\frac{\sum\limits_{i=1}^{N}I(y_i=c_k,x^{(j)}_i=a_{jl})}{\sum\limits_{i=1}^{N}I(y_i=c_k)} \end{align} μjlk=i=1∑NI(yi=ck)i=1∑NI(yi=ck,xi(j)=ajl)