R如何对数据进行分组
1. factor()函数
> mtcars$cyl <- factor(mtcars$cyl)
> mtcars$cyl[1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
Levels: 4 6 8
2. cut()函数
> cut(mtcars$mpg,c(seq(10,50,10)))[1] (20,30] (20,30] (20,30] (20,30] (10,20] (10,20] (10,20] (20,30] (20,30] (10,20] (10,20]
[12] (10,20] (10,20] (10,20] (10,20] (10,20] (10,20] (30,40] (30,40] (30,40] (20,30] (10,20]
[23] (10,20] (10,20] (10,20] (20,30] (20,30] (30,40] (10,20] (10,20] (10,20] (20,30]
Levels: (10,20] (20,30] (30,40] (40,50]
R进行频数统计
1. table()函数
#一维数据的频数
> table(mtcars$cyl)4 6 8
11 7 14
> table(cut(mtcars$mpg,c(seq(10,50,10))))(10,20] (20,30] (30,40] (40,50] 18 10 4 0 #二维数据的频数:一个空白对照,一个处理组
> table(mtcars$cyl,mtcars$gear)3 4 54 1 8 26 2 4 18 12 0 2#R语言中的with函数,把所有操作都限制在数据框上
> with(mtcars,table(cyl,gear)) #强调用mtcars数据集中的数据gear
cyl 3 4 54 1 8 26 2 4 18 12 0 2
#三维列联表
> with(mtcars,table(cyl,gear,carb))
, , carb = 1gear
cyl 3 4 54 1 4 06 2 0 08 0 0 0, , carb = 2gear
cyl 3 4 54 0 4 26 0 0 08 4 0 0, , carb = 3gear
cyl 3 4 54 0 0 06 0 0 08 3 0 0, , carb = 4gear
cyl 3 4 54 0 0 06 0 4 08 5 0 1, , carb = 6gear
cyl 3 4 54 0 0 06 0 0 18 0 0 0, , carb = 8gear
cyl 3 4 54 0 0 06 0 0 08 0 0 1
2. xtabs()函数
xtabs(formula = ~., data)
formula:公式
data:数据框
#一维数据频数
> xtabs(~cyl,mtcars)
cyl4 6 8
11 7 14 #二维数据频数
> xtabs(~cyl+gear,mtcars)gear
cyl 3 4 54 1 8 26 2 4 18 12 0 2
#三维列联表
> xtabs(~cyl+gear+carb,mtcars)#平铺式列联表
> y <- xtabs(~cyl+gear+carb,mtcars)
> ftable(y)carb 1 2 3 4 6 8
cyl gear
4 3 1 0 0 0 0 04 4 4 0 0 0 05 0 2 0 0 0 0
6 3 2 0 0 0 0 04 0 0 0 4 0 05 0 0 0 0 1 0
8 3 0 4 3 5 0 04 0 0 0 0 0 05 0 0 0 1 0 1
R进行频率统计
1. prop.table()函数
> prop.table(table(mtcars$cyl))4 6 8
0.34375 0.21875 0.43750
> prop.table(table(cut(mtcars$mpg,c(seq(10,50,10)))))(10,20] (20,30] (30,40] (40,50] 0.5625 0.3125 0.1250 0.0000
对于二维连表,使用margin.table()和prop.table()按行或者按列计算边界频数与比例
- margin.table()函数
margin.table(x, margin = NULL)
x:一个数组
margin:一个向量,给出计算其和的边界。例如,对于矩阵1表示行,2表示列,c(1,2)表示行和列。
> x <- xtabs(~cyl+gear,mtcars)
#整个数组的频数
> margin.table(x)
[1] 32
#行的频数
> margin.table(x,1)
cyl4 6 8
11 7 14
#列的频数
> margin.table(x,2)
gear3 4 5
15 12 5
#行和列的频数
> margin.table(x,c(1,2))gear
cyl 3 4 54 1 8 26 2 4 18 12 0 2
- prop.table()函数
#按行进行统计
> prop.table(x,1)gear
cyl 3 4 54 0.09090909 0.72727273 0.181818186 0.28571429 0.57142857 0.142857148 0.85714286 0.00000000 0.14285714
#按列进行统计
> prop.table(x,2)gear
cyl 3 4 54 0.06666667 0.66666667 0.400000006 0.13333333 0.33333333 0.200000008 0.80000000 0.00000000 0.40000000
> prop.table(x)gear
cyl 3 4 54 0.03125 0.25000 0.062506 0.06250 0.12500 0.031258 0.37500 0.00000 0.06250
- addmagrins()函数
> addmargins(x)gear
cyl 3 4 5 Sum4 1 8 2 116 2 4 1 78 12 0 2 14Sum 15 12 5 32
> addmargins(x,1)gear
cyl 3 4 54 1 8 26 2 4 18 12 0 2Sum 15 12 5
> addmargins(x,2)gear
cyl 3 4 5 Sum4 1 8 2 116 2 4 1 78 12 0 2 14
> addmargins(x,c(1,2))gear
cyl 3 4 5 Sum4 1 8 2 116 2 4 1 78 12 0 2 14Sum 15 12 5 32