Numpy数字统计函数
np.sum | 所有元素的和 |
np.prod | 所有元素的乘积 |
np.cumsum | 元素的累积加和(前1,前2..元素累加) |
np.cumprod | 元素的累积乘积(前1,前2..元素累乘) |
np.min | 最小值 |
np.max | 最大值 |
np.percentile | 0-100百分位数 |
np.quantile | 0-1分位数 |
np.median | 中位数 |
np.mean | 平均值 |
np.std | 标准差 |
np.var | 方差 |
np.average | 加权平均,参数可以指定weights |
1、函数的应用
1.1 np.sum(x)
python">import numpy as npx=np.arange(12).reshape(3,4)
print("x=",x)
print("np.sum(x)=",np.sum(x))
x= [[ 0 1 2 3][ 4 5 6 7][ 8 9 10 11]] np.sum(x)= 66
1.2 np.prod(x)
python">import numpy as npx=np.arange(12).reshape(3,4)
print("x=",x)
print("np.prod(x)=",np.prod(x))
x= [[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
np.prod(x)= 0
1.3 np.cumsum(x)
python">import numpy as npx=np.arange(12).reshape(3,4)
print("x=",x)
print("np.cumsum(x)=",np.cumsum(x))
x= [[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
np.cumsum(x)= [ 0 1 3 6 10 15 21 28 36 45 55 66]
1.4 np.cumprod(x)
python">import numpy as npx=np.arange(12).reshape(3,4)
print("x=",x)
print("np.cumprod(x)=",np.cumprod(x))
x= [[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
np.cumprod(x)= [0 0 0 0 0 0 0 0 0 0 0 0]
1.5 np.min(x)
python">import numpy as npx=np.arange(12).reshape(3,4)
print("x=",x)
print("np.min(x)=",np.min(x))
x= [[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
np.min(x)= 0
1.6 np.max(x)
python">import numpy as npx=np.arange(12).reshape(3,4)
print("x=",x)
print("np.max(x)=",np.max(x))
x= [[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
np.max(x)= 11
1.7 np.percentile和np.quantile
np.percentile和np.quantile,偶数位不存在的数用左右两数和除2
python">import numpy as npx=np.arange(12).reshape(3,4)
print("x=",x)
#np.percentile将数组按照从小到大进行排列,然后取排在25%,50%,75%位置的数字
print("np.percentile(x,[25,50,75])=",np.percentile(x,[25,50,75]))#np.quantile将数组按照从小到大进行排列,然后取排在0.25,0.5,0.75位置的数字
print("np.quantile(x,[0.25,0.5,0.75])=",np.quantile(x,[0.25,0.5,0.75]))
x= [[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
np.percentile(x,[25,50,75])= [2.75 5.5 8.25]
np.quantile(x,[0.25,0.5,0.75])= [2.75 5.5 8.25]
1.8 np.median(x)
python">import numpy as npx=np.arange(12).reshape(3,4)
print("x=",x)
# 中位数
print("np.median(x)=",np.median(x))
x= [[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
np.median(x)= 5.5
1.9 np.mean(x)
python">import numpy as npx=np.arange(12).reshape(3,4)
print("x=",x)# 平均数
print("np.mean(x)=",np.mean(x))
x= [[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
np.mean(x)= 5.5
1.10 np.std(x)
python">import numpy as npx=np.arange(12).reshape(3,4)
print("x=",x)# 标准差
print("np.std(x)=",np.std(x))
x= [[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
np.std(x)= 3.452052529534663
1.11 np.var(x)
python">import numpy as npx=np.arange(12).reshape(3,4)
print("x=",x)# 方差
print("np.var(x)=",np.var(x))
x= [[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
np.var(x)= 11.916666666666666
1.12 np.average(x,weight)
python">import numpy as npx=np.arange(12).reshape(3,4)
print("x=",x)weight= np.random.randn(*x.shape)# np.np.average加权平均值
print("np.average(x,weight)=",np.average(x,weights=weight))
x= [[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
np.average(x,weight)= 5.666459528596581
2、Numpy中axis参数的用途
其中
- axis=0代表行,按行运算
- axis=1代表列,按列运算
理解,axis=0代表把行消除,axis=1代表把列消除
python">import numpy as npx=np.arange(12).reshape(3,4)
print("x=",x)print("np.sum(x,axis=0)=",np.sum(x,axis=0))
print("np.sum(x,axis=1)=",np.sum(x,axis=1))
x= [[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
np.sum(x,axis=0)= [12 15 18 21]
np.sum(x,axis=1)= [ 6 22 38]
3、 实例,将机器学习中的数据进行标准化
其中
- 行 代表每个样本数据
- 列 代表样本的属性值
为什么要进行标准化?
- 对于机器学习中,不同列的量纲相同,则训练收敛的更快
- 因为对于属性值来讲,如A属性是价格0-100元,B属性是销量100万+,因此这两个属性没有可比性,因此需要进行标准化
- 不同的数字代表不同的特征,因此需要按行计算,即axis=0
- 标准化公式为X=(X-mean(X,axis=0))/std(X,axis=0)
python">import numpy as npx=np.arange(12).reshape(3,4)
print("x=",x)print("标准化",(x-np.mean(x,axis=0)/np.std(x,axis=0)))
x= [[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
标准化 [[-1.22474487 -1.22474487 -1.22474487 -1.22474487]
[ 0. 0. 0. 0. ]
[ 1.22474487 1.22474487 1.22474487 1.22474487]]
可以看出每列之和的结果为0
此外,注意x是3*4的数组,而np.mean(x,axis=0)是1*4的数组,这两者是如何相减的呢,即array中的广播机制,把np.mean(x,axis=0)1*4的数组复制成3*4的数组,即把行复制3次进行相减的计算。