方差分析
文章目录
- 方差分析
- @[toc]
- 1 单因素方差分析
- 2 多因素方差分析
文章目录
- 方差分析
- @[toc]
- 1 单因素方差分析
- 2 多因素方差分析
# 配置stata17在jupyter 配置
import stata_setup
stata_setup.config(r"D:\Stata17\setup",'se')
1 单因素方差分析
from pystata import stata
# 查看单因素方差分析命令介绍
stata.run("help(oneway)")
stata.run("webuse apple")
(Apple trees)
# 描述统计
%stata sum
# 或者stata.run("sum")
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------treatment | 10 2.3 1.159502 1 4weight | 10 80.62 25.36212 48.9 117.5
# 频率表
%stata tab weight treatment
# 或者stata.run("tab weight treatment")
Average |weight in | Fertilizergrams | 1 2 3 4 | Total
-----------+--------------------------------------------+----------48.9 | 0 1 0 0 | 1 50.4 | 0 1 0 0 | 1 58.9 | 0 1 0 0 | 1 67.3 | 0 0 0 1 | 1 70.4 | 0 0 1 0 | 1 86.9 | 0 0 1 0 | 1 87.7 | 0 0 0 1 | 1 104.4 | 1 0 0 0 | 1 113.8 | 1 0 0 0 | 1 117.5 | 1 0 0 0 | 1
-----------+--------------------------------------------+----------Total | 3 3 2 2 | 10
# 散点图
%stata scatter weight treatment
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vjtjpRBH-1683949793570)(output_7_0.svg)]
方差分析原假设为
H 0 H_0 H0: treatment水平对 weight没有影响;
H 1 H_1 H1: treatment水平对 weight有影响;
构建F统计量
F = M S A M S E = S S A / ( r − 1 ) S S E / ( n − r ) ∼ F ( r − 1 , n − r ) \mathrm{F}=\frac{\mathrm{MSA}}{\mathrm{MSE}}=\frac{\mathrm{SSA} /(\mathrm{r}-1)}{\mathrm{SSE} /(\mathrm{n}-\mathrm{r})} \sim \mathrm{F}(\mathrm{r}-1, \mathrm{n}-\mathrm{r}) F=MSEMSA=SSE/(n−r)SSA/(r−1)∼F(r−1,n−r)
其中MSA表示组间均方差,包括组间差异与随机差异。MSE表示水平均均方差,当二者差距不大时,表明结果的差异性主要来随机性差异,反之为系统性差异(水平对结果存在干扰)
# 单因素方差分析
stata.run("oneway weight treatment")
Analysis of varianceSource SS df MS F Prob > F
------------------------------------------------------------------------
Between groups 5295.54433 3 1765.18144 21.46 0.0013Within groups 493.591667 6 82.2652778
------------------------------------------------------------------------Total 5789.136 9 643.237333Bartlett's equal-variances test: chi2(3) = 1.3900 Prob>chi2 = 0.708
# 加入tabulate参数可以查看水平内weight均值和标准误等信息
stata.run("oneway weight treatment,tabulate")
| Summary of Average weight in gramsFertilizer | Mean Std. dev. Freq.
------------+------------------------------------1 | 111.9 6.7535176 32 | 52.733333 5.3928966 33 | 78.65 11.667262 24 | 77.5 14.424978 2
------------+------------------------------------Total | 80.62 25.362124 10Analysis of varianceSource SS df MS F Prob > F
------------------------------------------------------------------------
Between groups 5295.54433 3 1765.18144 21.46 0.0013Within groups 493.591667 6 82.2652778
------------------------------------------------------------------------Total 5789.136 9 643.237333Bartlett's equal-variances test: chi2(3) = 1.3900 Prob>chi2 = 0.708
F值为21.46,p值非常小,说明treat对weight又影响。由于方差分析要保证同方差假定,最后结果表明equal-variance伴随概率为0.708不拒绝原假设,即同方差,结果可靠
2 多因素方差分析
# 查看anova用法
%stata help(anova)
多因素方差分析原假设为
H 0 H_0 H0: 水平A或B对 weight没有影响;
H 1 H_1 H1:水平A或B对 weight有影响;
构建F统计量
F A = M S A M S E = S S A / ( r − 1 ) S S E / ( n − r − k + 1 ) ∼ F ( r − 1 , n − r − k + 1 ) F B = M S B M S E = S S B / ( k − 1 ) S S E / ( n − r − k + 1 ) ∼ F ( k − 1 , n − r − k + 1 ) \begin{aligned} &\mathrm{F}_{\mathrm{A}}=\frac{\mathrm{MSA}}{\mathrm{MSE}}=\frac{\mathrm{SSA} /(\mathrm{r}-1)}{\mathrm{SSE} /(\mathrm{n}-\mathrm{r}-\mathrm{k}+1)} \sim \mathrm{F}(\mathrm{r}-1, \mathrm{n}-\mathrm{r}-\mathrm{k}+1) \\ &\mathrm{F}_{\mathrm{B}}=\frac{\mathrm{MSB}}{\mathrm{MSE}}=\frac{\mathrm{SSB} /(\mathrm{k}-1)}{\mathrm{SSE} /(\mathrm{n}-\mathrm{r}-\mathrm{k}+1)} \sim \mathrm{F}(\mathrm{k}-1, \mathrm{n}-\mathrm{r}-\mathrm{k}+1) \end{aligned} FA=MSEMSA=SSE/(n−r−k+1)SSA/(r−1)∼F(r−1,n−r−k+1)FB=MSEMSB=SSE/(n−r−k+1)SSB/(k−1)∼F(k−1,n−r−k+1)
# 单个因素情形
stata.run("""
webuse systolic
anova systolic drug
""")
.
. webuse systolic
(Systolic blood pressure data). anova systolic drugNumber of obs = 58 R-squared = 0.3355Root MSE = 10.7211 Adj R-squared = 0.2985Source | Partial SS df MS F Prob>F-----------+----------------------------------------------------Model | 3133.2385 3 1044.4128 9.09 0.0001|drug | 3133.2385 3 1044.4128 9.09 0.0001|Residual | 6206.9167 54 114.9429 -----------+----------------------------------------------------Total | 9340.1552 57 163.86237 .
# 双因素方差分析
%stata anova systolic drug disease
Number of obs = 58 R-squared = 0.3803Root MSE = 10.5503 Adj R-squared = 0.3207Source | Partial SS df MS F Prob>F-----------+----------------------------------------------------Model | 3552.0722 5 710.41445 6.38 0.0001|drug | 3063.4329 3 1021.1443 9.17 0.0001disease | 418.83374 2 209.41687 1.88 0.1626|Residual | 5788.0829 52 111.30929 -----------+----------------------------------------------------Total | 9340.1552 57 163.86237
# 交互项情形
%stata anova systolic drug##disease
Number of obs = 58 R-squared = 0.4560Root MSE = 10.5096 Adj R-squared = 0.3259Source | Partial SS df MS F Prob>F-------------+----------------------------------------------------Model | 4259.3385 11 387.21259 3.51 0.0013|drug | 2997.4719 3 999.15729 9.05 0.0001disease | 415.87305 2 207.93652 1.88 0.1637drug#disease | 707.26626 6 117.87771 1.07 0.3958|Residual | 5080.8167 46 110.45254 -------------+----------------------------------------------------Total | 9340.1552 57 163.86237
# 三因素情形
stata.run("""
webuse manuf
anova yield temp chem temp#chem meth temp#meth chem#meth temp#chem#meth
""")
.
. webuse manuf
(Manufacturing process data). anova yield temp chem temp#chem meth temp#meth chem#meth temp#chem#methNumber of obs = 36 R-squared = 0.5474Root MSE = 2.62996 Adj R-squared = 0.3399Source | Partial SS df MS F Prob>F----------------------+----------------------------------------------------Model | 200.75 11 18.25 2.64 0.0227|temperature | 30.5 2 15.25 2.20 0.1321chemical | 12.25 1 12.25 1.77 0.1958temperature#chemical | 24.5 2 12.25 1.77 0.1917method | 42.25 1 42.25 6.11 0.0209temperature#method | 87.5 2 43.75 6.33 0.0062chemical#method | .25 1 .25 0.04 0.8508temperature#chemical# | method | 3.5 2 1.75 0.25 0.7785|Residual | 166 24 6.9166667 ----------------------+----------------------------------------------------Total | 366.75 35 10.478571 .
更多详情help(anova)