这是搬运的。
《基于numpy的鸢尾花数据操作》
一、 实验准备
1.1 实验概述
我们本次实验将会使用的编程语言是Python,在本次实验当中我们将会使用结合我们学习过的numpy 中的知识点进行实验,通过本实验当中我们将学会如何使用numpy来对实际数据进行处理,加深numpy的理解。
Numpy:
NumPy(Numerical Python的简称)是Python数值计算最重要的基础包。大多数提供科学计算的包都是用NumPy的数组作为构建基础。
NumPy的部分功能如下:
1、ndarray,一个具有矢量算术运算和复杂广播能力的快速且节省空间的多维数组。 2、用于对整组数据进行快速运算的标准数学函数(无需编写循环)。 3、用于读写磁盘数据的工具以及用于操作内存映射文件的工具。 4、线性代数、随机数生成以及傅里叶变换功能。 5、用于集成由C、C++、Fortran等语言编写的代码的A C API。
由于NumPy提供了一个简单易用的C API,因此很容易将数据传递给由低级语言编写的外部库,外部库也能以NumPy数组的形式将数据返回给Python。这个功能使Python成为一种包装C/C++/Fortran历史代码库的选择,并使被包装库拥有一个动态的、易用的接口。
NumPy本身并没有提供多么高级的数据分析功能,理解NumPy数组以及面向数组的计算将有助于你更加高效地使用诸如pandas之类的工具。因为NumPy是一个很大的题目,我会在附录A中介绍更多NumPy高级功能,比如广播。
对于大部分数据分析应用而言,我们最关注的功能主要集中在:
1、用于数据整理和清理、子集构造和过滤、转换等快速的矢量化数组运算。 2、常用的数组算法,如排序、唯一化、集合运算等。 3、高效的描述统计和数据聚合/摘要运算。 4、用于异构数据集的合并/连接运算的数据对齐和关系型数据运算。 5、将条件逻辑表述为数组表达式(而不是带有if-elif-else分支的循环)。 6、数据的分组运算(聚合、转换、函数应用等)。
1.2 实验目的
- 了解各类数据文件
- 掌握numpy中各种方法的灵活应用
- 掌握numpy对实际数据的处理方法
- 掌握numpy对真实数据处理的流程
1.3 实验环境
实验环境:python3.6以上、Numpy、Jupyter Notebook、Google Chrome\IE浏览器
二、 实验步骤
2.1 数据的读取
NumPy能够读写磁盘上的文本数据或二进制数据。本次实验我们将直接加载鸢尾花的数据,数据集当中主要包括了鸢尾花的花萼长宽、花瓣长宽以及鸢尾花的类别,我们将直接从网上导入数据(可能会有些慢,之后我们会直接使用下载好的数据),并从元组数据中提取一列。
import numpy as np url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_1d = np.genfromtxt(url, delimiter=',', dtype=None) print(iris_1d) #提取一列 species = np.array([row[4] for row in iris_1d]) print(species[:5])
部分输出如下:
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em>#请在此处写你的代码</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000"><strong>import</strong></span> <span style="color:#000000">numpy</span> <span style="color:#008000"><strong>as</strong></span> <span style="color:#000000">np</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">url</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#ba2121">'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">iris_1d</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.genfromtxt(<span style="color:#000000">url</span>, <span style="color:#000000">delimiter</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">','</span>, <span style="color:#000000">dtype</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#008000"><strong>None</strong></span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span>(<span style="color:#000000">iris_1d</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em>#提取一列</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">species</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.array([<span style="color:#000000">row</span>[<span style="color:#008800">4</span>] <span style="color:#008000"><strong>for</strong></span> <span style="color:#000000">row</span> <span style="color:#008000"><strong>in</strong></span> <span style="color:#000000">iris_1d</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span>(<span style="color:#000000">species</span>[:<span style="color:#008800">5</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
[(5.1, 3.5, 1.4, 0.2, b'Iris-setosa') (4.9, 3. , 1.4, 0.2, b'Iris-setosa')(4.7, 3.2, 1.3, 0.2, b'Iris-setosa') (4.6, 3.1, 1.5, 0.2, b'Iris-setosa')(5. , 3.6, 1.4, 0.2, b'Iris-setosa') (5.4, 3.9, 1.7, 0.4, b'Iris-setosa')(4.6, 3.4, 1.4, 0.3, b'Iris-setosa') (5. , 3.4, 1.5, 0.2, b'Iris-setosa')(4.4, 2.9, 1.4, 0.2, b'Iris-setosa') (4.9, 3.1, 1.5, 0.1, b'Iris-setosa')(5.4, 3.7, 1.5, 0.2, b'Iris-setosa') (4.8, 3.4, 1.6, 0.2, b'Iris-setosa')(4.8, 3. , 1.4, 0.1, b'Iris-setosa') (4.3, 3. , 1.1, 0.1, b'Iris-setosa')(5.8, 4. , 1.2, 0.2, b'Iris-setosa') (5.7, 4.4, 1.5, 0.4, b'Iris-setosa')(5.4, 3.9, 1.3, 0.4, b'Iris-setosa') (5.1, 3.5, 1.4, 0.3, b'Iris-setosa')(5.7, 3.8, 1.7, 0.3, b'Iris-setosa') (5.1, 3.8, 1.5, 0.3, b'Iris-setosa')(5.4, 3.4, 1.7, 0.2, b'Iris-setosa') (5.1, 3.7, 1.5, 0.4, b'Iris-setosa')(4.6, 3.6, 1. , 0.2, b'Iris-setosa') (5.1, 3.3, 1.7, 0.5, b'Iris-setosa')(4.8, 3.4, 1.9, 0.2, b'Iris-setosa') (5. , 3. , 1.6, 0.2, b'Iris-setosa')(5. , 3.4, 1.6, 0.4, b'Iris-setosa') (5.2, 3.5, 1.5, 0.2, b'Iris-setosa')(5.2, 3.4, 1.4, 0.2, b'Iris-setosa') (4.7, 3.2, 1.6, 0.2, b'Iris-setosa')(4.8, 3.1, 1.6, 0.2, b'Iris-setosa') (5.4, 3.4, 1.5, 0.4, b'Iris-setosa')(5.2, 4.1, 1.5, 0.1, b'Iris-setosa') (5.5, 4.2, 1.4, 0.2, b'Iris-setosa')(4.9, 3.1, 1.5, 0.1, b'Iris-setosa') (5. , 3.2, 1.2, 0.2, b'Iris-setosa')(5.5, 3.5, 1.3, 0.2, b'Iris-setosa') (4.9, 3.1, 1.5, 0.1, b'Iris-setosa')(4.4, 3. , 1.3, 0.2, b'Iris-setosa') (5.1, 3.4, 1.5, 0.2, b'Iris-setosa')(5. , 3.5, 1.3, 0.3, b'Iris-setosa') (4.5, 2.3, 1.3, 0.3, b'Iris-setosa')(4.4, 3.2, 1.3, 0.2, b'Iris-setosa') (5. , 3.5, 1.6, 0.6, b'Iris-setosa')(5.1, 3.8, 1.9, 0.4, b'Iris-setosa') (4.8, 3. , 1.4, 0.3, b'Iris-setosa')(5.1, 3.8, 1.6, 0.2, b'Iris-setosa') (4.6, 3.2, 1.4, 0.2, b'Iris-setosa')(5.3, 3.7, 1.5, 0.2, b'Iris-setosa') (5. , 3.3, 1.4, 0.2, b'Iris-setosa')(7. , 3.2, 4.7, 1.4, b'Iris-versicolor')(6.4, 3.2, 4.5, 1.5, b'Iris-versicolor')(6.9, 3.1, 4.9, 1.5, b'Iris-versicolor')(5.5, 2.3, 4. , 1.3, b'Iris-versicolor')(6.5, 2.8, 4.6, 1.5, b'Iris-versicolor')(5.7, 2.8, 4.5, 1.3, b'Iris-versicolor')(6.3, 3.3, 4.7, 1.6, b'Iris-versicolor')(4.9, 2.4, 3.3, 1. , b'Iris-versicolor')(6.6, 2.9, 4.6, 1.3, b'Iris-versicolor')(5.2, 2.7, 3.9, 1.4, b'Iris-versicolor')(5. , 2. , 3.5, 1. , b'Iris-versicolor')(5.9, 3. , 4.2, 1.5, b'Iris-versicolor')(6. , 2.2, 4. , 1. , b'Iris-versicolor')(6.1, 2.9, 4.7, 1.4, b'Iris-versicolor')(5.6, 2.9, 3.6, 1.3, b'Iris-versicolor')(6.7, 3.1, 4.4, 1.4, b'Iris-versicolor')(5.6, 3. , 4.5, 1.5, b'Iris-versicolor')(5.8, 2.7, 4.1, 1. , b'Iris-versicolor')(6.2, 2.2, 4.5, 1.5, b'Iris-versicolor')(5.6, 2.5, 3.9, 1.1, b'Iris-versicolor')(5.9, 3.2, 4.8, 1.8, b'Iris-versicolor')(6.1, 2.8, 4. , 1.3, b'Iris-versicolor')(6.3, 2.5, 4.9, 1.5, b'Iris-versicolor')(6.1, 2.8, 4.7, 1.2, b'Iris-versicolor')(6.4, 2.9, 4.3, 1.3, b'Iris-versicolor')(6.6, 3. , 4.4, 1.4, b'Iris-versicolor')(6.8, 2.8, 4.8, 1.4, b'Iris-versicolor')(6.7, 3. , 5. , 1.7, b'Iris-versicolor')(6. , 2.9, 4.5, 1.5, b'Iris-versicolor')(5.7, 2.6, 3.5, 1. , b'Iris-versicolor')(5.5, 2.4, 3.8, 1.1, b'Iris-versicolor')(5.5, 2.4, 3.7, 1. , b'Iris-versicolor')(5.8, 2.7, 3.9, 1.2, b'Iris-versicolor')(6. , 2.7, 5.1, 1.6, b'Iris-versicolor')(5.4, 3. , 4.5, 1.5, b'Iris-versicolor')(6. , 3.4, 4.5, 1.6, b'Iris-versicolor')(6.7, 3.1, 4.7, 1.5, b'Iris-versicolor')(6.3, 2.3, 4.4, 1.3, b'Iris-versicolor')(5.6, 3. , 4.1, 1.3, b'Iris-versicolor')(5.5, 2.5, 4. , 1.3, b'Iris-versicolor')(5.5, 2.6, 4.4, 1.2, b'Iris-versicolor')(6.1, 3. , 4.6, 1.4, b'Iris-versicolor')(5.8, 2.6, 4. , 1.2, b'Iris-versicolor')(5. , 2.3, 3.3, 1. , b'Iris-versicolor')(5.6, 2.7, 4.2, 1.3, b'Iris-versicolor')(5.7, 3. , 4.2, 1.2, b'Iris-versicolor')(5.7, 2.9, 4.2, 1.3, b'Iris-versicolor')(6.2, 2.9, 4.3, 1.3, b'Iris-versicolor')(5.1, 2.5, 3. , 1.1, b'Iris-versicolor')(5.7, 2.8, 4.1, 1.3, b'Iris-versicolor')(6.3, 3.3, 6. , 2.5, b'Iris-virginica')(5.8, 2.7, 5.1, 1.9, b'Iris-virginica')(7.1, 3. , 5.9, 2.1, b'Iris-virginica')(6.3, 2.9, 5.6, 1.8, b'Iris-virginica')(6.5, 3. , 5.8, 2.2, b'Iris-virginica')(7.6, 3. , 6.6, 2.1, b'Iris-virginica')(4.9, 2.5, 4.5, 1.7, b'Iris-virginica')(7.3, 2.9, 6.3, 1.8, b'Iris-virginica')(6.7, 2.5, 5.8, 1.8, b'Iris-virginica')(7.2, 3.6, 6.1, 2.5, b'Iris-virginica')(6.5, 3.2, 5.1, 2. , b'Iris-virginica')(6.4, 2.7, 5.3, 1.9, b'Iris-virginica')(6.8, 3. , 5.5, 2.1, b'Iris-virginica')(5.7, 2.5, 5. , 2. , b'Iris-virginica')(5.8, 2.8, 5.1, 2.4, b'Iris-virginica')(6.4, 3.2, 5.3, 2.3, b'Iris-virginica')(6.5, 3. , 5.5, 1.8, b'Iris-virginica')(7.7, 3.8, 6.7, 2.2, b'Iris-virginica')(7.7, 2.6, 6.9, 2.3, b'Iris-virginica')(6. , 2.2, 5. , 1.5, b'Iris-virginica')(6.9, 3.2, 5.7, 2.3, b'Iris-virginica')(5.6, 2.8, 4.9, 2. , b'Iris-virginica')(7.7, 2.8, 6.7, 2. , b'Iris-virginica')(6.3, 2.7, 4.9, 1.8, b'Iris-virginica')(6.7, 3.3, 5.7, 2.1, b'Iris-virginica')(7.2, 3.2, 6. , 1.8, b'Iris-virginica')(6.2, 2.8, 4.8, 1.8, b'Iris-virginica')(6.1, 3. , 4.9, 1.8, b'Iris-virginica')(6.4, 2.8, 5.6, 2.1, b'Iris-virginica')(7.2, 3. , 5.8, 1.6, b'Iris-virginica')(7.4, 2.8, 6.1, 1.9, b'Iris-virginica')(7.9, 3.8, 6.4, 2. , b'Iris-virginica')(6.4, 2.8, 5.6, 2.2, b'Iris-virginica')(6.3, 2.8, 5.1, 1.5, b'Iris-virginica')(6.1, 2.6, 5.6, 1.4, b'Iris-virginica')(7.7, 3. , 6.1, 2.3, b'Iris-virginica')(6.3, 3.4, 5.6, 2.4, b'Iris-virginica')(6.4, 3.1, 5.5, 1.8, b'Iris-virginica')(6. , 3. , 4.8, 1.8, b'Iris-virginica')(6.9, 3.1, 5.4, 2.1, b'Iris-virginica')(6.7, 3.1, 5.6, 2.4, b'Iris-virginica')(6.9, 3.1, 5.1, 2.3, b'Iris-virginica')(5.8, 2.7, 5.1, 1.9, b'Iris-virginica')(6.8, 3.2, 5.9, 2.3, b'Iris-virginica')(6.7, 3.3, 5.7, 2.5, b'Iris-virginica')(6.7, 3. , 5.2, 2.3, b'Iris-virginica')(6.3, 2.5, 5. , 1.9, b'Iris-virginica')(6.5, 3. , 5.2, 2. , b'Iris-virginica')(6.2, 3.4, 5.4, 2.3, b'Iris-virginica')(5.9, 3. , 5.1, 1.8, b'Iris-virginica')]
[b'Iris-setosa' b'Iris-setosa' b'Iris-setosa' b'Iris-setosa'b'Iris-setosa']
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:4: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.after removing the cwd from sys.path.
将一维元组数组转化为二维numpy数组
import numpy as np iris_1d = np.genfromtxt('iris.data', delimiter=',', dtype=None) # 方法1,将每一行转换为一个列表并获取前4项 iris_2d = np.array([row.tolist()[:4] for row in iris_1d]) # 打印转化后的二维numpy数组的前5行 print(iris_2d[:5]) # 方法2,仅从源导入前4列 iris_2d = np.genfromtxt('iris.data', delimiter=',', dtype='float', usecols=[0, 1, 2, 3]) # 打印转化后的二维numpy数组的前5行 print(iris_2d[:5])
输出如下:
[[5.1 3.5 1.4 0.2] [4.9 3. 1.4 0.2] [4.7 3.2 1.3 0.2] [4.6 3.1 1.5 0.2] [5. 3.6 1.4 0.2]] [[5.1 3.5 1.4 0.2] [4.9 3. 1.4 0.2] [4.7 3.2 1.3 0.2] [4.6 3.1 1.5 0.2] [5. 3.6 1.4 0.2]]
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em>#请在此处写你的代码</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000"><strong>import</strong></span> <span style="color:#000000">numpy</span> <span style="color:#008000"><strong>as</strong></span> <span style="color:#000000">np</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">iris_1d</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.genfromtxt(<span style="color:#ba2121">'iris.data'</span>, <span style="color:#000000">delimiter</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">','</span>, <span style="color:#000000">dtype</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#008000"><strong>None</strong></span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 方法1,将每一行转换为一个列表并获取前4项</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">iris_2d</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.array([<span style="color:#000000">row</span>.tolist()[:<span style="color:#008800">4</span>] <span style="color:#008000"><strong>for</strong></span> <span style="color:#000000">row</span> <span style="color:#008000"><strong>in</strong></span> <span style="color:#000000">iris_1d</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 打印转化后的二维numpy数组的前5行</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span>(<span style="color:#000000">iris_2d</span>[:<span style="color:#008800">5</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 方法2,仅从源导入前4列</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">iris_2d</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.genfromtxt(<span style="color:#ba2121">'iris.data'</span>, <span style="color:#000000">delimiter</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">','</span>, <span style="color:#000000">dtype</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">'float'</span>, <span style="color:#000000">usecols</span><span style="color:#aa22ff"><strong>=</strong></span>[<span style="color:#008800">0</span>, <span style="color:#008800">1</span>, <span style="color:#008800">2</span>, <span style="color:#008800">3</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 打印转化后的二维numpy数组的前5行</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span>(<span style="color:#000000">iris_2d</span>[:<span style="color:#008800">5</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
[[5.1 3.5 1.4 0.2][4.9 3. 1.4 0.2][4.7 3.2 1.3 0.2][4.6 3.1 1.5 0.2][5. 3.6 1.4 0.2]]
[[5.1 3.5 1.4 0.2][4.9 3. 1.4 0.2][4.7 3.2 1.3 0.2][4.6 3.1 1.5 0.2][5. 3.6 1.4 0.2]]
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:5: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default."""
求出鸢尾属植物萼片长度的平均值、中位数和标准差(第1列)
import numpy as np # 先提取要计算的一列 sepallength = np.genfromtxt('iris.data', delimiter=',', dtype='float', usecols=[0]) mu, med, sd = np.mean(sepallength), np.median(sepallength), np.std(sepallength) print(mu, med, sd)
输出如下:
5.843333333333334 5.8 0.8253012917851409
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em>#请在此处写你的代码</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000"><strong>import</strong></span> <span style="color:#000000">numpy</span> <span style="color:#008000"><strong>as</strong></span> <span style="color:#000000">np</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 先提取要计算的一列</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">sepallength</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.genfromtxt(<span style="color:#ba2121">'iris.data'</span>, <span style="color:#000000">delimiter</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">','</span>, <span style="color:#000000">dtype</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">'float'</span>, <span style="color:#000000">usecols</span><span style="color:#aa22ff"><strong>=</strong></span>[<span style="color:#008800">0</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">mu</span>, <span style="color:#000000">med</span>, <span style="color:#000000">sd</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.mean(<span style="color:#000000">sepallength</span>), <span style="color:#000000">np</span>.median(<span style="color:#000000">sepallength</span>), <span style="color:#000000">np</span>.std(<span style="color:#000000">sepallength</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span>(<span style="color:#000000">mu</span>, <span style="color:#000000">med</span>, <span style="color:#000000">sd</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
5.843333333333334 5.8 0.8253012917851409
2.2 规范化数组
在numpy中我们是否规范化数组呢?使数组的值正好介于0和1之间?答案当然是“肯定的”
import numpy as np # 创建一种标准化形式的鸢尾属植物间隔长度,其值正好介于0和1之间,这样最小值为0,最大值为1 sepallength = np.genfromtxt('iris.data', delimiter=',', dtype='float', usecols=[0]) Smax, Smin = sepallength.max(), sepallength.min() S = (sepallength - Smin)/(Smax - Smin) print(S) # or ptp()表示最大值-最小值 S = (sepallength - Smin)/sepallength.ptp() print(S)
输出如下:
[0.22 0.17 0.11 0.08 0.19 0.31 0.08 0.19 0.03 0.17 0.31 0.14 0.14 0. 0.42 0.39 0.31 0.22 0.39 0.22 0.31 0.22 0.08 0.22 0.14 0.19 0.19 0.25 0.25 0.11 0.14 0.31 0.25 0.33 0.17 0.19 0.33 0.17 0.03 0.22 0.19 0.06 0.03 0.19 0.22 0.14 0.22 0.08 0.28 0.19 0.75 0.58 0.72 0.33 0.61 0.39 0.56 0.17 0.64 0.25 0.19 0.44 0.47 0.5 0.36 0.67 0.36 0.42 0.53 0.36 0.44 0.5 0.56 0.5 0.58 0.64 0.69 0.67 0.47 0.39 0.33 0.33 0.42 0.47 0.31 0.47 0.67 0.56 0.36 0.33 0.33 0.5 0.42 0.19 0.36 0.39 0.39 0.53 0.22 0.39 0.56 0.42 0.78 0.56 0.61 0.92 0.17 0.83 0.67 0.81 0.61 0.58 0.69 0.39 0.42 0.58 0.61 0.94 0.94 0.47 0.72 0.36 0.94 0.56 0.67 0.81 0.53 0.5 0.58 0.81 0.86 1. 0.58 0.56 0.5 0.94 0.56 0.58 0.47 0.72 0.67 0.72 0.42 0.69 0.67 0.67 0.56 0.61 0.53 0.44] [0.22 0.17 0.11 0.08 0.19 0.31 0.08 0.19 0.03 0.17 0.31 0.14 0.14 0. 0.42 0.39 0.31 0.22 0.39 0.22 0.31 0.22 0.08 0.22 0.14 0.19 0.19 0.25 0.25 0.11 0.14 0.31 0.25 0.33 0.17 0.19 0.33 0.17 0.03 0.22 0.19 0.06 0.03 0.19 0.22 0.14 0.22 0.08 0.28 0.19 0.75 0.58 0.72 0.33 0.61 0.39 0.56 0.17 0.64 0.25 0.19 0.44 0.47 0.5 0.36 0.67 0.36 0.42 0.53 0.36 0.44 0.5 0.56 0.5 0.58 0.64 0.69 0.67 0.47 0.39 0.33 0.33 0.42 0.47 0.31 0.47 0.67 0.56 0.36 0.33 0.33 0.5 0.42 0.19 0.36 0.39 0.39 0.53 0.22 0.39 0.56 0.42 0.78 0.56 0.61 0.92 0.17 0.83 0.67 0.81 0.61 0.58 0.69 0.39 0.42 0.58 0.61 0.94 0.94 0.47 0.72 0.36 0.94 0.56 0.67 0.81 0.53 0.5 0.58 0.81 0.86 1. 0.58 0.56 0.5 0.94 0.56 0.58 0.47 0.72 0.67 0.72 0.42 0.69 0.67 0.67 0.56 0.61 0.53 0.44]
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em>#请在此处写你的代码</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000"><strong>import</strong></span> <span style="color:#000000">numpy</span> <span style="color:#008000"><strong>as</strong></span> <span style="color:#000000">np</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 创建一种标准化形式的鸢尾属植物间隔长度,其值正好介于0和1之间,这样最小值为0,最大值为1</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">sepallength</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.genfromtxt(<span style="color:#ba2121">'iris.data'</span>, <span style="color:#000000">delimiter</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">','</span>, <span style="color:#000000">dtype</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">'float'</span>, <span style="color:#000000">usecols</span><span style="color:#aa22ff"><strong>=</strong></span>[<span style="color:#008800">0</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">Smax</span>, <span style="color:#000000">Smin</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">sepallength</span>.max(), <span style="color:#000000">sepallength</span>.min()</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">S</span> <span style="color:#aa22ff"><strong>=</strong></span> (<span style="color:#000000">sepallength</span> <span style="color:#aa22ff"><strong>-</strong></span> <span style="color:#000000">Smin</span>)<span style="color:#aa22ff"><strong>/</strong></span>(<span style="color:#000000">Smax</span> <span style="color:#aa22ff"><strong>-</strong></span> <span style="color:#000000">Smin</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span>(<span style="color:#000000">S</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># or ptp()表示最大值-最小值</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">S</span> <span style="color:#aa22ff"><strong>=</strong></span> (<span style="color:#000000">sepallength</span> <span style="color:#aa22ff"><strong>-</strong></span> <span style="color:#000000">Smin</span>)<span style="color:#aa22ff"><strong>/</strong></span><span style="color:#000000">sepallength</span>.ptp()</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span>(<span style="color:#000000">S</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
[0.22222222 0.16666667 0.11111111 0.08333333 0.19444444 0.305555560.08333333 0.19444444 0.02777778 0.16666667 0.30555556 0.138888890.13888889 0. 0.41666667 0.38888889 0.30555556 0.222222220.38888889 0.22222222 0.30555556 0.22222222 0.08333333 0.222222220.13888889 0.19444444 0.19444444 0.25 0.25 0.111111110.13888889 0.30555556 0.25 0.33333333 0.16666667 0.194444440.33333333 0.16666667 0.02777778 0.22222222 0.19444444 0.055555560.02777778 0.19444444 0.22222222 0.13888889 0.22222222 0.083333330.27777778 0.19444444 0.75 0.58333333 0.72222222 0.333333330.61111111 0.38888889 0.55555556 0.16666667 0.63888889 0.250.19444444 0.44444444 0.47222222 0.5 0.36111111 0.666666670.36111111 0.41666667 0.52777778 0.36111111 0.44444444 0.50.55555556 0.5 0.58333333 0.63888889 0.69444444 0.666666670.47222222 0.38888889 0.33333333 0.33333333 0.41666667 0.472222220.30555556 0.47222222 0.66666667 0.55555556 0.36111111 0.333333330.33333333 0.5 0.41666667 0.19444444 0.36111111 0.388888890.38888889 0.52777778 0.22222222 0.38888889 0.55555556 0.416666670.77777778 0.55555556 0.61111111 0.91666667 0.16666667 0.833333330.66666667 0.80555556 0.61111111 0.58333333 0.69444444 0.388888890.41666667 0.58333333 0.61111111 0.94444444 0.94444444 0.472222220.72222222 0.36111111 0.94444444 0.55555556 0.66666667 0.805555560.52777778 0.5 0.58333333 0.80555556 0.86111111 1.0.58333333 0.55555556 0.5 0.94444444 0.55555556 0.583333330.47222222 0.72222222 0.66666667 0.72222222 0.41666667 0.694444440.66666667 0.66666667 0.55555556 0.61111111 0.52777778 0.44444444]
[0.22222222 0.16666667 0.11111111 0.08333333 0.19444444 0.305555560.08333333 0.19444444 0.02777778 0.16666667 0.30555556 0.138888890.13888889 0. 0.41666667 0.38888889 0.30555556 0.222222220.38888889 0.22222222 0.30555556 0.22222222 0.08333333 0.222222220.13888889 0.19444444 0.19444444 0.25 0.25 0.111111110.13888889 0.30555556 0.25 0.33333333 0.16666667 0.194444440.33333333 0.16666667 0.02777778 0.22222222 0.19444444 0.055555560.02777778 0.19444444 0.22222222 0.13888889 0.22222222 0.083333330.27777778 0.19444444 0.75 0.58333333 0.72222222 0.333333330.61111111 0.38888889 0.55555556 0.16666667 0.63888889 0.250.19444444 0.44444444 0.47222222 0.5 0.36111111 0.666666670.36111111 0.41666667 0.52777778 0.36111111 0.44444444 0.50.55555556 0.5 0.58333333 0.63888889 0.69444444 0.666666670.47222222 0.38888889 0.33333333 0.33333333 0.41666667 0.472222220.30555556 0.47222222 0.66666667 0.55555556 0.36111111 0.333333330.33333333 0.5 0.41666667 0.19444444 0.36111111 0.388888890.38888889 0.52777778 0.22222222 0.38888889 0.55555556 0.416666670.77777778 0.55555556 0.61111111 0.91666667 0.16666667 0.833333330.66666667 0.80555556 0.61111111 0.58333333 0.69444444 0.388888890.41666667 0.58333333 0.61111111 0.94444444 0.94444444 0.472222220.72222222 0.36111111 0.94444444 0.55555556 0.66666667 0.805555560.52777778 0.5 0.58333333 0.80555556 0.86111111 1.0.58333333 0.55555556 0.5 0.94444444 0.55555556 0.583333330.47222222 0.72222222 0.66666667 0.72222222 0.41666667 0.694444440.66666667 0.66666667 0.55555556 0.61111111 0.52777778 0.44444444]
找到numpy数组的百分位数
import numpy as np sepallength = np.genfromtxt('iris.data', delimiter=',', dtype='float', usecols=[0]) print(np.percentile(sepallength, q=[5, 95])) # [4.6 7.255]
输出如下:
[4.6 7.25]
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em>#请在此处写你的代码</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000"><strong>import</strong></span> <span style="color:#000000">numpy</span> <span style="color:#008000"><strong>as</strong></span> <span style="color:#000000">np</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">sepallength</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.genfromtxt(<span style="color:#ba2121">'iris.data'</span>, <span style="color:#000000">delimiter</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">','</span>, <span style="color:#000000">dtype</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">'float'</span>, <span style="color:#000000">usecols</span><span style="color:#aa22ff"><strong>=</strong></span>[<span style="color:#008800">0</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span>(<span style="color:#000000">np</span>.percentile(<span style="color:#000000">sepallength</span>, <span style="color:#000000">q</span><span style="color:#aa22ff"><strong>=</strong></span>[<span style="color:#008800">5</span>, <span style="color:#008800">95</span>]))</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># </em></span><span style="color:#00bb00"><em>[</em></span><span style="color:#408080"><em>4.6 7.255</em></span><span style="color:#00bb00"><em>]</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
[4.6 7.255]
在数组中的随机位置插入值
import numpy as np # 在iris_2d数据集中的20个随机位置插入np.nan值 iris_2d = np.genfromtxt('iris.data', delimiter=',', dtype='object') # 方法1 np.random.seed(100) # i,j包含iris_2d所有元素的行号和列号 i, j = np.where(iris_2d) # print(i, j) iris_2d[np.random.choice((i), 20), np.random.choice((j), 20)] = np.nan print(iris_2d[:10]) # 方法2 np.random.seed(100) iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan print(iris_2d[:10])
输出如下:
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em>#请在此处写你的代码</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000"><strong>import</strong></span> <span style="color:#000000">numpy</span> <span style="color:#008000"><strong>as</strong></span> <span style="color:#000000">np</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 在iris_2d数据集中的20个随机位置插入np.nan值</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">iris_2d</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.genfromtxt(<span style="color:#ba2121">'iris.data'</span>, <span style="color:#000000">delimiter</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">','</span>, <span style="color:#000000">dtype</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">'object'</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 方法1</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">np</span>.random.seed(<span style="color:#008800">100</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># i,j包含iris_2d所有元素的行号和列号</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">i</span>, <span style="color:#000000">j</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.where(<span style="color:#000000">iris_2d</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># print(i, j)</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">iris_2d</span>[<span style="color:#000000">np</span>.random.choice((<span style="color:#000000">i</span>), <span style="color:#008800">20</span>), <span style="color:#000000">np</span>.random.choice((<span style="color:#000000">j</span>), <span style="color:#008800">20</span>)] <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.nan</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span>(<span style="color:#000000">iris_2d</span>[:<span style="color:#008800">10</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 方法2 </em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">np</span>.random.seed(<span style="color:#008800">100</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">iris_2d</span>[<span style="color:#000000">np</span>.random.randint(<span style="color:#008800">150</span>, <span style="color:#000000">size</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#008800">20</span>), <span style="color:#000000">np</span>.random.randint(<span style="color:#008800">4</span>, <span style="color:#000000">size</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#008800">20</span>)] <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.nan</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span><span style="color:#00bb00">(</span><span style="color:#000000">iris_2d</span>[:<span style="color:#008800">10</span>]<span style="color:#00bb00">)</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
[[b'5.1' b'3.5' b'1.4' b'0.2' b'Iris-setosa'][b'4.9' b'3.0' b'1.4' b'0.2' b'Iris-setosa'][b'4.7' b'3.2' b'1.3' b'0.2' b'Iris-setosa'][b'4.6' b'3.1' b'1.5' b'0.2' b'Iris-setosa'][b'5.0' b'3.6' b'1.4' b'0.2' b'Iris-setosa'][b'5.4' b'3.9' b'1.7' b'0.4' b'Iris-setosa'][b'4.6' b'3.4' b'1.4' b'0.3' b'Iris-setosa'][b'5.0' b'3.4' b'1.5' b'0.2' b'Iris-setosa'][b'4.4' b'2.9' b'1.4' b'0.2' b'Iris-setosa'][b'4.9' b'3.1' b'1.5' b'0.1' b'Iris-setosa']]
[[b'5.1' b'3.5' b'1.4' b'0.2' b'Iris-setosa'][b'4.9' b'3.0' b'1.4' b'0.2' b'Iris-setosa'][b'4.7' b'3.2' b'1.3' b'0.2' b'Iris-setosa'][b'4.6' b'3.1' b'1.5' b'0.2' b'Iris-setosa'][b'5.0' b'3.6' b'1.4' b'0.2' b'Iris-setosa'][b'5.4' b'3.9' b'1.7' b'0.4' b'Iris-setosa'][b'4.6' b'3.4' b'1.4' b'0.3' b'Iris-setosa'][b'5.0' b'3.4' b'1.5' b'0.2' b'Iris-setosa'][b'4.4' nan b'1.4' b'0.2' b'Iris-setosa'][b'4.9' b'3.1' b'1.5' b'0.1' b'Iris-setosa']]
使用numpy还可以找到数组中缺失值的位置
import numpy as np # 在iris_2d的sepallength中查找缺失值的数量和位置(第1列) iris_2d = np.genfromtxt('iris.data', delimiter=',',usecols=(0, 1,2,3),dtype=float) iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan print("缺失值个数: \n", np.isnan(iris_2d[:, 0]).sum()) print("缺失值位置: \n", np.where(np.isnan(iris_2d[:, 0])))
输出如下:
缺失值个数: 2 缺失值位置: (array([36, 56]),)
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em>#请在此处写你的代码</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000"><strong>import</strong></span> <span style="color:#000000">numpy</span> <span style="color:#008000"><strong>as</strong></span> <span style="color:#000000">np</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 在iris_2d的sepallength中查找缺失值的数量和位置(第1列)</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">iris_2d</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.genfromtxt(<span style="color:#ba2121">'iris.data'</span>, <span style="color:#000000">delimiter</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">','</span>,<span style="color:#000000">usecols</span><span style="color:#aa22ff"><strong>=</strong></span>(<span style="color:#008800">0</span>, <span style="color:#008800">1</span>,<span style="color:#008800">2</span>,<span style="color:#008800">3</span>),<span style="color:#000000">dtype</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#008000">float</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">iris_2d</span>[<span style="color:#000000">np</span>.random.randint(<span style="color:#008800">150</span>, <span style="color:#000000">size</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#008800">20</span>), <span style="color:#000000">np</span>.random.randint(<span style="color:#008800">4</span>, <span style="color:#000000">size</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#008800">20</span>)] <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.nan</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span>(<span style="color:#ba2121">"缺失值个数: \n"</span>, <span style="color:#000000">np</span>.isnan(<span style="color:#000000">iris_2d</span>[:, <span style="color:#008800">0</span>]).sum())</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span><span style="color:#00bb00">(</span><span style="color:#ba2121">"缺失值位置: \n"</span>, <span style="color:#000000">np</span>.where(<span style="color:#000000">np</span>.isnan(<span style="color:#000000">iris_2d</span>[:, <span style="color:#008800">0</span>]))<span style="color:#00bb00">)</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
缺失值个数: 5
缺失值位置: (array([ 38, 80, 106, 113, 121]),)
从numpy数组中删除包含缺失值的行
import numpy as np iris_2d = np.genfromtxt('iris.data', delimiter=',', dtype='float', usecols=[0, 1, 2, 3]) iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan # print(iris_2d) # Method 1: ~表示取反 any_nan_in_row = np.array([~np.any(np.isnan(row)) for row in iris_2d]) # print(any_nan_in_row) # 打印的是布尔型数组 # print(iris_2d[any_nan_in_row]) 打印的是剔除掉包含缺失值的行的矩阵 print(iris_2d[any_nan_in_row][:5]) # Method 2: # print(np.isnan(iris_2d)) # 返回的是布尔型数组;false+false+false+false == 0 print(iris_2d[np.sum(np.isnan(iris_2d), axis=1) == 0][:5])
输出如下:
[[5.1 3.5 1.4 0.2] [4.9 3. 1.4 0.2] [4.6 3.1 1.5 0.2] [5. 3.6 1.4 0.2] [5.4 3.9 1.7 0.4]] [[5.1 3.5 1.4 0.2] [4.9 3. 1.4 0.2] [4.6 3.1 1.5 0.2] [5. 3.6 1.4 0.2] [5.4 3.9 1.7 0.4]]
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em>#请在此处写你的代码</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000"><strong>import</strong></span> <span style="color:#000000">numpy</span> <span style="color:#008000"><strong>as</strong></span> <span style="color:#000000">np</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">iris_2d</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.genfromtxt(<span style="color:#ba2121">'iris.data'</span>, <span style="color:#000000">delimiter</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">','</span>, <span style="color:#000000">dtype</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">'float'</span>, <span style="color:#000000">usecols</span><span style="color:#aa22ff"><strong>=</strong></span>[<span style="color:#008800">0</span>, <span style="color:#008800">1</span>, <span style="color:#008800">2</span>, <span style="color:#008800">3</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">iris_2d</span>[<span style="color:#000000">np</span>.random.randint(<span style="color:#008800">150</span>, <span style="color:#000000">size</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#008800">20</span>), <span style="color:#000000">np</span>.random.randint(<span style="color:#008800">4</span>, <span style="color:#000000">size</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#008800">20</span>)] <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.nan</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># print(iris_2d)</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># Method 1: ~表示取反</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">any_nan_in_row</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.array([<span style="color:#aa22ff"><strong>~</strong></span><span style="color:#000000">np</span>.any(<span style="color:#000000">np</span>.isnan(<span style="color:#000000">row</span>)) <span style="color:#008000"><strong>for</strong></span> <span style="color:#000000">row</span> <span style="color:#008000"><strong>in</strong></span> <span style="color:#000000">iris_2d</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># print(any_nan_in_row) # 打印的是布尔型数组</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># print(iris_2d[any_nan_in_row]) 打印的是剔除掉包含缺失值的行的矩阵</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span>(<span style="color:#000000">iris_2d</span>[<span style="color:#000000">any_nan_in_row</span>][:<span style="color:#008800">5</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># Method 2:</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># print(np.isnan(iris_2d)) # 返回的是布尔型数组;false+false+false+false == 0</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span><span style="color:#00bb00">(</span><span style="color:#000000">iris_2d</span>[<span style="color:#000000">np</span>.sum(<span style="color:#000000">np</span>.isnan(<span style="color:#000000">iris_2d</span>), <span style="color:#000000">axis</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#008800">1</span>) <span style="color:#aa22ff"><strong>==</strong></span> <span style="color:#008800">0</span>][:<span style="color:#008800">5</span>]<span style="color:#00bb00">)</span></span></span></span></span></span>
[[4.9 3. 1.4 0.2][4.7 3.2 1.3 0.2][4.6 3.1 1.5 0.2][5. 3.6 1.4 0.2][4.6 3.4 1.4 0.3]]
[[4.9 3. 1.4 0.2][4.7 3.2 1.3 0.2][4.6 3.1 1.5 0.2][5. 3.6 1.4 0.2][4.6 3.4 1.4 0.3]]
找到numpy数组的两列之间的相关性
import numpy as np # 在iris_2d中找出SepalLength(第1列)和PetalLength(第3列)之间的相关性 iris_2d = np.genfromtxt('iris.data', delimiter=',', dtype='float', usecols=[0, 1, 2, 3]) # Solution 1 print(np.corrcoef(iris_2d[:, 0], iris_2d[:, 2])[0, 1]) # 0.8717541573048718 # Solution 2 from scipy.stats.stats import pearsonr corr, p_value = pearsonr(iris_2d[:, 0], iris_2d[:, 2]) print(corr) # 0.8717541573048713
输出如下:
0.8717541573048718 0.8717541573048714
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em>#请在此处写你的代码</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000"><strong>import</strong></span> <span style="color:#000000">numpy</span> <span style="color:#008000"><strong>as</strong></span> <span style="color:#000000">np</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 在iris_2d中找出SepalLength(第1列)和PetalLength(第3列)之间的相关性</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">iris_2d</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.genfromtxt(<span style="color:#ba2121">'iris.data'</span>, <span style="color:#000000">delimiter</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">','</span>, <span style="color:#000000">dtype</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">'float'</span>, <span style="color:#000000">usecols</span><span style="color:#aa22ff"><strong>=</strong></span>[<span style="color:#008800">0</span>, <span style="color:#008800">1</span>, <span style="color:#008800">2</span>, <span style="color:#008800">3</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># Solution 1</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span>(<span style="color:#000000">np</span>.corrcoef(<span style="color:#000000">iris_2d</span>[:, <span style="color:#008800">0</span>], <span style="color:#000000">iris_2d</span>[:, <span style="color:#008800">2</span>])[<span style="color:#008800">0</span>, <span style="color:#008800">1</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 0.8717541573048718</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># Solution 2</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000"><strong>from</strong></span> <span style="color:#000000">scipy</span>.stats.stats <span style="color:#008000"><strong>import</strong></span> <span style="color:#000000">pearsonr</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">corr</span>, <span style="color:#000000">p_value</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">pearsonr</span>(<span style="color:#000000">iris_2d</span>[:, <span style="color:#008800">0</span>], <span style="color:#000000">iris_2d</span>[:, <span style="color:#008800">2</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span>(<span style="color:#000000">corr</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 0.8717541573048713</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
0.871754157304871
0.8717541573048713
查找给定数组是否具有任何空值
import numpy as np url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0, 1, 2, 3]) print(np.isnan(iris_2d).any()) # False
输出如下:
False
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em>#请在此处写你的代码</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000"><strong>import</strong></span> <span style="color:#000000">numpy</span> <span style="color:#008000"><strong>as</strong></span> <span style="color:#000000">np</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">url</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#ba2121">'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">iris_2d</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.genfromtxt(<span style="color:#000000">url</span>, <span style="color:#000000">delimiter</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">','</span>, <span style="color:#000000">dtype</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">'float'</span>, <span style="color:#000000">usecols</span><span style="color:#aa22ff"><strong>=</strong></span>[<span style="color:#008800">0</span>, <span style="color:#008800">1</span>, <span style="color:#008800">2</span>, <span style="color:#008800">3</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span>(<span style="color:#000000">np</span>.isnan(<span style="color:#000000">iris_2d</span>).any())</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># False</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
False
在numpy数组中查找唯一值的计数
import numpy as np # 找出鸢尾属植物物种中的独特值和独特值的数量 url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') species = np.array([row.tolist()[4] for row in iris]) print(species) u, counts = np.unique(species, return_counts=True) print(u, counts)
输出如下:
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em>#请在此处写你的代码</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000"><strong>import</strong></span> <span style="color:#000000">numpy</span> <span style="color:#008000"><strong>as</strong></span> <span style="color:#000000">np</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 找出鸢尾属植物物种中的独特值和独特值的数量</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">url</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#ba2121">'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">iris</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.genfromtxt(<span style="color:#000000">url</span>, <span style="color:#000000">delimiter</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">','</span>, <span style="color:#000000">dtype</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">'object'</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">species</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.array([<span style="color:#000000">row</span>.tolist()[<span style="color:#008800">4</span>] <span style="color:#008000"><strong>for</strong></span> <span style="color:#000000">row</span> <span style="color:#008000"><strong>in</strong></span> <span style="color:#000000">iris</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span>(<span style="color:#000000">species</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">u</span>, <span style="color:#000000">counts</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.unique(<span style="color:#000000">species</span>, <span style="color:#000000">return_counts</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#008000"><strong>True</strong></span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span><span style="color:#00bb00">(</span><span style="color:#000000">u</span>, <span style="color:#000000">counts</span><span style="color:#00bb00">)</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
[b'Iris-setosa' b'Iris-setosa' b'Iris-setosa' b'Iris-setosa'b'Iris-setosa' b'Iris-setosa' b'Iris-setosa' b'Iris-setosa'b'Iris-setosa' b'Iris-setosa' b'Iris-setosa' b'Iris-setosa'b'Iris-setosa' b'Iris-setosa' b'Iris-setosa' b'Iris-setosa'b'Iris-setosa' b'Iris-setosa' b'Iris-setosa' b'Iris-setosa'b'Iris-setosa' b'Iris-setosa' b'Iris-setosa' b'Iris-setosa'b'Iris-setosa' b'Iris-setosa' b'Iris-setosa' b'Iris-setosa'b'Iris-setosa' b'Iris-setosa' b'Iris-setosa' b'Iris-setosa'b'Iris-setosa' b'Iris-setosa' b'Iris-setosa' b'Iris-setosa'b'Iris-setosa' b'Iris-setosa' b'Iris-setosa' b'Iris-setosa'b'Iris-setosa' b'Iris-setosa' b'Iris-setosa' b'Iris-setosa'b'Iris-setosa' b'Iris-setosa' b'Iris-setosa' b'Iris-setosa'b'Iris-setosa' b'Iris-setosa' b'Iris-versicolor' b'Iris-versicolor'b'Iris-versicolor' b'Iris-versicolor' b'Iris-versicolor'b'Iris-versicolor' b'Iris-versicolor' b'Iris-versicolor'b'Iris-versicolor' b'Iris-versicolor' b'Iris-versicolor'b'Iris-versicolor' b'Iris-versicolor' b'Iris-versicolor'b'Iris-versicolor' b'Iris-versicolor' b'Iris-versicolor'b'Iris-versicolor' b'Iris-versicolor' b'Iris-versicolor'b'Iris-versicolor' b'Iris-versicolor' b'Iris-versicolor'b'Iris-versicolor' b'Iris-versicolor' b'Iris-versicolor'b'Iris-versicolor' b'Iris-versicolor' b'Iris-versicolor'b'Iris-versicolor' b'Iris-versicolor' b'Iris-versicolor'b'Iris-versicolor' b'Iris-versicolor' b'Iris-versicolor'b'Iris-versicolor' b'Iris-versicolor' b'Iris-versicolor'b'Iris-versicolor' b'Iris-versicolor' b'Iris-versicolor'b'Iris-versicolor' b'Iris-versicolor' b'Iris-versicolor'b'Iris-versicolor' b'Iris-versicolor' b'Iris-versicolor'b'Iris-versicolor' b'Iris-versicolor' b'Iris-versicolor'b'Iris-virginica' b'Iris-virginica' b'Iris-virginica' b'Iris-virginica'b'Iris-virginica' b'Iris-virginica' b'Iris-virginica' b'Iris-virginica'b'Iris-virginica' b'Iris-virginica' b'Iris-virginica' b'Iris-virginica'b'Iris-virginica' b'Iris-virginica' b'Iris-virginica' b'Iris-virginica'b'Iris-virginica' b'Iris-virginica' b'Iris-virginica' b'Iris-virginica'b'Iris-virginica' b'Iris-virginica' b'Iris-virginica' b'Iris-virginica'b'Iris-virginica' b'Iris-virginica' b'Iris-virginica' b'Iris-virginica'b'Iris-virginica' b'Iris-virginica' b'Iris-virginica' b'Iris-virginica'b'Iris-virginica' b'Iris-virginica' b'Iris-virginica' b'Iris-virginica'b'Iris-virginica' b'Iris-virginica' b'Iris-virginica' b'Iris-virginica'b'Iris-virginica' b'Iris-virginica' b'Iris-virginica' b'Iris-virginica'b'Iris-virginica' b'Iris-virginica' b'Iris-virginica' b'Iris-virginica'b'Iris-virginica' b'Iris-virginica']
[b'Iris-setosa' b'Iris-versicolor' b'Iris-virginica'] [50 50 50]
将数字转换为分类(文本)数组
import numpy as np # 将iris_2d的花瓣长度(第3列)加入以形成文本数组 # Less than 3 --> 'small' # 3-5 --> 'medium' # '>=5 --> 'large' url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') # [0, 3, 5, 10]表示划分成[0,3), [3,5), [5,10), [10,>10) 4个区间; # 返回的petallength数组是每个元素对应这4个区间的索引 petallength = np.digitize(iris[:, 2].astype('float'), [0, 3, 5, 10]) # print(petallength) label_map = {1: 'small', 2: 'medium', 3: 'large', 4: np.nan} petallength2 = [label_map[x] for x in petallength] print(petallength2)
输出如下:
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em>#请在此处写你的代码</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000"><strong>import</strong></span> <span style="color:#000000">numpy</span> <span style="color:#008000"><strong>as</strong></span> <span style="color:#000000">np</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 将iris_2d的花瓣长度(第3列)加入以形成文本数组</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># Less than 3 --> 'small'</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 3-5 --> 'medium'</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># '>=5 --> 'large'</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">url</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#ba2121">'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">iris</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.genfromtxt(<span style="color:#000000">url</span>, <span style="color:#000000">delimiter</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">','</span>, <span style="color:#000000">dtype</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">'object'</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># [0, 3, 5, 10]表示划分成[0,3), [3,5), [5,10), [10,>10) 4个区间;</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 返回的petallength数组是每个元素对应这4个区间的索引</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">petallength</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.digitize(<span style="color:#000000">iris</span>[:, <span style="color:#008800">2</span>].astype(<span style="color:#ba2121">'float'</span>), [<span style="color:#008800">0</span>, <span style="color:#008800">3</span>, <span style="color:#008800">5</span>, <span style="color:#008800">10</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># print(petallength)</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">label_map</span> <span style="color:#aa22ff"><strong>=</strong></span> {<span style="color:#008800">1</span>: <span style="color:#ba2121">'small'</span>, <span style="color:#008800">2</span>: <span style="color:#ba2121">'medium'</span>, <span style="color:#008800">3</span>: <span style="color:#ba2121">'large'</span>, <span style="color:#008800">4</span>: <span style="color:#000000">np</span>.nan}</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">petallength2</span> <span style="color:#aa22ff"><strong>=</strong></span> [<span style="color:#000000">label_map</span>[<span style="color:#000000">x</span>] <span style="color:#008000"><strong>for</strong></span> <span style="color:#000000">x</span> <span style="color:#008000"><strong>in</strong></span> <span style="color:#000000">petallength</span>]</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span><span style="color:#00bb00">(</span><span style="color:#000000">petallength2</span><span style="color:#00bb00">)</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
['small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'large', 'medium', 'medium', 'medium', 'medium', 'medium', 'large', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'large', 'large', 'large', 'large', 'large', 'large', 'medium', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'medium', 'large', 'medium', 'large', 'large', 'medium', 'medium', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'medium', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large']
按列对2D数组进行排序
import numpy as np # 根据sepallength列对数据集进行排序 url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') print(iris[iris[:, 0].argsort()])
部分输出如下:
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em>#请在此处写你的代码</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000"><strong>import</strong></span> <span style="color:#000000">numpy</span> <span style="color:#008000"><strong>as</strong></span> <span style="color:#000000">np</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 将iris_2d的花瓣长度(第3列)加入以形成文本数组</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># Less than 3 --> 'small'</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 3-5 --> 'medium'</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># '>=5 --> 'large'</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">url</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#ba2121">'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">iris</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.genfromtxt(<span style="color:#000000">url</span>, <span style="color:#000000">delimiter</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">','</span>, <span style="color:#000000">dtype</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">'object'</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># [0, 3, 5, 10]表示划分成[0,3), [3,5), [5,10), [10,>10) 4个区间;</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 返回的petallength数组是每个元素对应这4个区间的索引</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">petallength</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.digitize(<span style="color:#000000">iris</span>[:, <span style="color:#008800">2</span>].astype(<span style="color:#ba2121">'float'</span>), [<span style="color:#008800">0</span>, <span style="color:#008800">3</span>, <span style="color:#008800">5</span>, <span style="color:#008800">10</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># print(petallength)</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">label_map</span> <span style="color:#aa22ff"><strong>=</strong></span> {<span style="color:#008800">1</span>: <span style="color:#ba2121">'small'</span>, <span style="color:#008800">2</span>: <span style="color:#ba2121">'medium'</span>, <span style="color:#008800">3</span>: <span style="color:#ba2121">'large'</span>, <span style="color:#008800">4</span>: <span style="color:#000000">np</span>.nan}</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">petallength2</span> <span style="color:#aa22ff"><strong>=</strong></span> [<span style="color:#000000">label_map</span>[<span style="color:#000000">x</span>] <span style="color:#008000"><strong>for</strong></span> <span style="color:#000000">x</span> <span style="color:#008000"><strong>in</strong></span> <span style="color:#000000">petallength</span>]</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span><span style="color:#00bb00">(</span><span style="color:#000000">petallength2</span><span style="color:#00bb00">)</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
['small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'large', 'medium', 'medium', 'medium', 'medium', 'medium', 'large', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'large', 'large', 'large', 'large', 'large', 'large', 'medium', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'medium', 'large', 'medium', 'large', 'large', 'medium', 'medium', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'medium', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large', 'large']
在numpy数组中找到最常见的值
import numpy as np # 在鸢尾属植物数据集中找到最常见的花瓣长度petallenth值(第3列) url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') vals, counts = np.unique(iris[:, 2], return_counts=True) # print(np.argmax(counts)) # 返回的是最大值所在的下标; print(vals[np.argmax(counts)])
输出如下:
b'1.5'
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em>#请在此处写你的代码</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000"><strong>import</strong></span> <span style="color:#000000">numpy</span> <span style="color:#008000"><strong>as</strong></span> <span style="color:#000000">np</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 在鸢尾属植物数据集中找到最常见的花瓣长度petallenth值(第3列)</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">url</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#ba2121">'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">iris</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.genfromtxt(<span style="color:#000000">url</span>, <span style="color:#000000">delimiter</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">','</span>, <span style="color:#000000">dtype</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">'object'</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">vals</span>, <span style="color:#000000">counts</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.unique(<span style="color:#000000">iris</span>[:, <span style="color:#008800">2</span>], <span style="color:#000000">return_counts</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#008000"><strong>True</strong></span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># print(np.argmax(counts)) # 返回的是最大值所在的下标;</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span><span style="color:#00bb00">(</span><span style="color:#000000">vals</span>[<span style="color:#000000">np</span>.argmax(<span style="color:#000000">counts</span>)]<span style="color:#00bb00">)</span></span></span></span></span></span>
b'1.5'
找到第一次出现的值大于给定值的位置
import numpy as np # 在数据集的第4列petalwidth中查找第一次出现的值大于1.0的位置。 url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') print(np.argwhere(iris[:, 3].astype(float) > 1.0)[0]) # print(np.argwhere(iris[:, 3].astype(float) > 1.0)) # 返回值是一个列向量 # print(np.where(iris[:, 3].astype(float) > 1.0)) # 返回值是数组
输出如下:
[50]
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em>#请在此处写你的代码</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000"><strong>import</strong></span> <span style="color:#000000">numpy</span> <span style="color:#008000"><strong>as</strong></span> <span style="color:#000000">np</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 在数据集的第4列petalwidth中查找第一次出现的值大于1.0的位置。</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">url</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#ba2121">'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">iris</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.genfromtxt(<span style="color:#000000">url</span>, <span style="color:#000000">delimiter</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">','</span>, <span style="color:#000000">dtype</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">'object'</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span>(<span style="color:#000000">np</span>.argwhere(<span style="color:#000000">iris</span>[:, <span style="color:#008800">3</span>].astype(<span style="color:#008000">float</span>) <span style="color:#aa22ff"><strong>></strong></span> <span style="color:#008800">1.0</span>)[<span style="color:#008800">0</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># print(np.argwhere(iris[:, 3].astype(float) > 1.0)) # 返回值是一个列向量</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># print(np.where(iris[:, 3].astype(float) > 1.0)) # 返回值是数组</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
[50]
将大于给定值的所有值替换为给定的截止值
import numpy as np # 从数组a中,用30替换所有大于30的元素,用10替换所有小于10的元素。 np.set_printoptions(precision=2) np.random.seed(100) # 生成1-50内的随机数组,长度是20个元素 a = np.random.uniform(1,50, 20) print(a) # [27.63 14.64 21.8 42.39 1.23 6.96 33.87 41.47 7.7 29.18 44.67 11.25 # 10.08 6.31 11.77 48.95 40.77 9.43 41. 14.43] # Solution 1: Using np.clip print(np.clip(a, a_min=10, a_max=30)) # [27.63 14.64 21.8 30. 10. 10. 30. 30. 10. 29.18 30. 11.25 # 10.08 10. 11.77 30. 30. 10. 30. 14.43] # Solution 2: Using np.where print(np.where(a < 10, 10, np.where(a > 30, 30, a))) # [27.63 14.64 21.8 30. 10. 10. 30. 30. 10. 29.18 30. 11.25 # 10.08 10. 11.77 30. 30. 10. 30. 14.43]
输出分别如下:
[27.63 14.64 21.8 42.39 1.23 6.96 33.87 41.47 7.7 29.18 44.67 11.25 10.08 6.31 11.77 48.95 40.77 9.43 41. 14.43] [27.63 14.64 21.8 30. 10. 10. 30. 30. 10. 29.18 30. 11.25 10.08 10. 11.77 30. 30. 10. 30. 14.43] [27.63 14.64 21.8 30. 10. 10. 30. 30. 10. 29.18 30. 11.25 10.08 10. 11.77 30. 30. 10. 30. 14.43]
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em>#请在此处写你的代码</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000"><strong>import</strong></span> <span style="color:#000000">numpy</span> <span style="color:#008000"><strong>as</strong></span> <span style="color:#000000">np</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 从数组a中,用30替换所有大于30的元素,用10替换所有小于10的元素。</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">np</span>.set_printoptions(<span style="color:#000000">precision</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#008800">2</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">np</span>.random.seed(<span style="color:#008800">100</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 生成1-50内的随机数组,长度是20个元素</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">a</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.random.uniform(<span style="color:#008800">1</span>,<span style="color:#008800">50</span>, <span style="color:#008800">20</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span>(<span style="color:#000000">a</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># [27.63 14.64 21.8 42.39 1.23 6.96 33.87 41.47 7.7 29.18 44.67 11.25</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 10.08 6.31 11.77 48.95 40.77 9.43 41. 14.43]</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># Solution 1: Using np.clip</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span>(<span style="color:#000000">np</span>.clip(<span style="color:#000000">a</span>, <span style="color:#000000">a_min</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#008800">10</span>, <span style="color:#000000">a_max</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#008800">30</span>))</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># [27.63 14.64 21.8 30. 10. 10. 30. 30. 10. 29.18 30. 11.25</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 10.08 10. 11.77 30. 30. 10. 30. 14.43]</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># Solution 2: Using np.where</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span>(<span style="color:#000000">np</span>.where(<span style="color:#000000">a</span> <span style="color:#aa22ff"><strong><</strong></span> <span style="color:#008800">10</span>, <span style="color:#008800">10</span>, <span style="color:#000000">np</span>.where(<span style="color:#000000">a</span> <span style="color:#aa22ff"><strong>></strong></span> <span style="color:#008800">30</span>, <span style="color:#008800">30</span>, <span style="color:#000000">a</span>)))</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># </em></span><span style="color:#00bb00"><em>[</em></span><span style="color:#408080"><em>27.63 14.64 21.8 30. 10. 10. 30. 30. 10. 29.18 30. 11.25</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 10.08 10. 11.77 30. 30. 10. 30. 14.43</em></span><span style="color:#00bb00"><em>]</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
[27.63 14.64 21.8 42.39 1.23 6.96 33.87 41.47 7.7 29.18 44.67 11.2510.08 6.31 11.77 48.95 40.77 9.43 41. 14.43]
[27.63 14.64 21.8 30. 10. 10. 30. 30. 10. 29.18 30. 11.2510.08 10. 11.77 30. 30. 10. 30. 14.43]
[27.63 14.64 21.8 30. 10. 10. 30. 30. 10. 29.18 30. 11.2510.08 10. 11.77 30. 30. 10. 30. 14.43]
根据给定的分类变量创建组ID
import numpy as np url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=[4]) np.random.seed(100) #随机数种子 species_small = np.sort(np.random.choice(species, size=20)) #排序 # 方法1: # output = [np.argwhere(np.unique(species_small) == s).tolist()[0][0] for val in np.unique(species_small) for s in species_small[species_small==val]] # 方法2: 使用循环遍历 output = [] uniqs = np.unique(species_small) for val in uniqs: # 在组中的唯一值 for s in species_small[species_small==val]: # 在组中的每一个元素 groupid = np.argwhere(uniqs == s).tolist()[0][0] # 组的ID output.append(groupid) print(output)
输出如下:
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em>#请在此处写你的代码</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000"><strong>import</strong></span> <span style="color:#000000">numpy</span> <span style="color:#008000"><strong>as</strong></span> <span style="color:#000000">np</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">url</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#ba2121">'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">species</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.genfromtxt(<span style="color:#000000">url</span>, <span style="color:#000000">delimiter</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">','</span>, <span style="color:#000000">dtype</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#ba2121">'str'</span>, <span style="color:#000000">usecols</span><span style="color:#aa22ff"><strong>=</strong></span>[<span style="color:#008800">4</span>])</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">np</span>.random.seed(<span style="color:#008800">100</span>) <span style="color:#408080"><em>#随机数种子</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">species_small</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.sort(<span style="color:#000000">np</span>.random.choice(<span style="color:#000000">species</span>, <span style="color:#000000">size</span><span style="color:#aa22ff"><strong>=</strong></span><span style="color:#008800">20</span>)) <span style="color:#408080"><em>#排序</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 方法1:</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># output = [np.argwhere(np.unique(species_small) == s).tolist()[0][0] for val in np.unique(species_small) for s in species_small[species_small==val]]</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#408080"><em># 方法2: 使用循环遍历</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">output</span> <span style="color:#aa22ff"><strong>=</strong></span> []</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#000000">uniqs</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.unique(<span style="color:#000000">species_small</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000"><strong>for</strong></span> <span style="color:#000000">val</span> <span style="color:#008000"><strong>in</strong></span> <span style="color:#000000">uniqs</span>: <span style="color:#408080"><em># 在组中的唯一值</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"> <span style="color:#008000"><strong>for</strong></span> <span style="color:#000000">s</span> <span style="color:#008000"><strong>in</strong></span> <span style="color:#000000">species_small</span>[<span style="color:#000000">species_small</span><span style="color:#aa22ff"><strong>==</strong></span><span style="color:#000000">val</span>]: <span style="color:#408080"><em># 在组中的每一个元素</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"> <span style="color:#000000">groupid</span> <span style="color:#aa22ff"><strong>=</strong></span> <span style="color:#000000">np</span>.argwhere(<span style="color:#000000">uniqs</span> <span style="color:#aa22ff"><strong>==</strong></span> <span style="color:#000000">s</span>).tolist()[<span style="color:#008800">0</span>][<span style="color:#008800">0</span>] <span style="color:#408080"><em># 组的ID</em></span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"> <span style="color:#000000">output</span>.append(<span style="color:#000000">groupid</span>)</span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"><span style="color:#008000">print</span><span style="color:#00bb00">(</span><span style="color:#000000">output</span><span style="color:#00bb00">)</span></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
<span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f7f7f7"><span style="color:black"><span style="color:inherit"></span></span></span></span></span>
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]
三、实验总结
通过本次实验的主要目的,是希望我们能够通过它一方面能够更加熟练的使用numpy;另一方面更重要的是,通过我们使用numpy对于鸢尾花的各种处理,让我们能够初步了解到数据处理是怎么一回事。实验当中用到的许多方法也是在我们之后进行数据分析的时候也会经常用到的。
四、思考与练习
在本次的实验当中,我们使用numpy进行了鸢尾花数据的处理,其中有相关性、规范化、缺失值处理等,想一想,你是否还起到其他的处理方法呢?不防验证验证。
总结:
1、了解了numpy库的功能
1、ndarray,一个具有矢量算术运算和复杂广播能力的快速且节省空间的多维数组。 2、用于对整组数据进行快速运算的标准数学函数(无需编写循环)。 3、用于读写磁盘数据的工具以及用于操作内存映射文件的工具。 4、线性代数、随机数生成以及傅里叶变换功能。 5、用于集成由C、C++、Fortran等语言编写的代码的A C API。
2、掌握了数据获取与处理方法
核心函数1:genfromtxt,直接从网站链接读取数据。
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)
核心函数2:np.array(),创建数组,代码中提取iris_ld列表第四列,并输出前4个数值。
species = np.array([row[4] for row in iris_1d]) print(species[:5])
核心函数3:row.tolist()[:4]获取数据每行前四列。
iris_2d = np.array([row.tolist()[:4] for row in iris_1d])
核心函数4:usecols=[0,1,2,3],从数据源获取每行前四列。
iris_2d = np.genfromtxt('iris.data', delimiter=',', dtype='float', usecols=[0, 1, 2, 3]
核心函数5:np.mean();np.median();np.std();获取指定
变量的平均值、中位数和标准差。
mu, med, sd = np.mean(sepallength), np.median(sepallength), np.std(sepallength)
核心函数6:.max;.min;求数据最小值、最大值,可以用来规范化数组。
Smax, Smin = sepallength.max(), sepallength.min()
核心函数7:percentile(sepallength,)找到变量的百分位数。
print(np.percentile(sepallength, q=[5, 95]))
核心函数8:random.choice()随机寻找变量的20个。
np.random.choice((i), 20)
核心函数9:random.randint()寻找变量的缺失值。
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan
核心函数10:~np.any(np.isnan(row))寻找缺失值所在的行。
any_nan_in_row = np.array([~np.any(np.isnan(row)) for row in iris_2d])
核心函数11:np.corrcoef()寻找两列数据之间的相关性
print(np.corrcoef(iris_2d[:, 0], iris_2d[:, 2])[0, 1])
核心函数12:np.any()寻找指定数组中的空值。
print(np.isnan(iris_2d).any())
核心函数13:np.unique()寻找特殊值的计数
u, counts = np.unique(species, return_counts=True
核心函数14:np.digitize()函数将数字转化为文本数组。
petallength = np.digitize(iris[:, 2].astype('float'), [0, 3, 5, 10])
核心函数15:iris按第一列二维数组排序。
print(iris[iris[:, 0].argsort()])
核心函数16:argwhere()函数在某列查找第一个大于特定值的数。
print(np.argwhere(iris[:, 3].astype(float) > 1.0)[0])
核心函数17:np.clip()替换函数。
print(np.clip(a, a_min=10, a_max=30))
核心函数18:创建ID。
for s in species_small[species_small==val]]
综合以上总结,通过反复训练与编写将会提升利用numpy库获取与处理数据的能力。