Pandas 总结试卷

devtools/2024/10/19 18:20:07/

Pandas 总结试卷

# 导入pandas
import pandas as pd
# 0. 创建一个空的DataFrame
df = pd.DataFrame()
df
# 创建一个包含学生姓名、年龄、成绩的DataFrame
# 数据表为:
'''
姓名	年龄	成绩
张三	18	85
李四	19	90
王五	20	78
赵六	21	92
'''data = {'姓名': ['张三', '李四', '王五', '赵六'],'年龄': [18, 19, 20, 21],'成绩': [85, 90, 78, 92]}
df = pd.DataFrame(data)
df
姓名年龄成绩
0张三1885
1李四1990
2王五2078
3赵六2192
# 查看DataFrame的前3行和后2行
print(df.head(3))
print(df.tail(2))
   姓名  年龄  成绩
0  张三  18  85
1  李四  19  90
2  王五  20  78姓名  年龄  成绩
2  王五  20  78
3  赵六  21  92
# 导入data1.csv文件, 赋值给df1
df1 = pd.read_csv('data1.csv', sep='\t')
df1
order_idquantityitem_namechoice_descriptionitem_price
011Chips and Fresh Tomato SalsaNaN$2.39
111Izze[Clementine]$3.39
211Nantucket Nectar[Apple]$3.39
311Chips and Tomatillo-Green Chili SalsaNaN$2.39
422Chicken Bowl[Tomatillo-Red Chili Salsa (Hot), [Black Beans...$16.98
..................
461718331Steak Burrito[Fresh Tomato Salsa, [Rice, Black Beans, Sour ...$11.75
461818331Steak Burrito[Fresh Tomato Salsa, [Rice, Sour Cream, Cheese...$11.75
461918341Chicken Salad Bowl[Fresh Tomato Salsa, [Fajita Vegetables, Pinto...$11.25
462018341Chicken Salad Bowl[Fresh Tomato Salsa, [Fajita Vegetables, Lettu...$8.75
462118341Chicken Salad Bowl[Fresh Tomato Salsa, [Fajita Vegetables, Pinto...$8.75

4622 rows × 5 columns

# 将df1保存为data1.xlsx文件
df1.to_excel('data1.xlsx', index=False)
# 创建一个包含数字1至10的一维Series
s = pd.Series(range(1, 11))
s
0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64
# 将DataFrame `df` 转换为Numpy数组
df.values
array([['张三', 18, 85],['李四', 19, 90],['王五', 20, 78],['赵六', 21, 92]], dtype=object)
# 将DataFrame `df` 转换为字典
df.to_dict()
{'姓名': {0: '张三', 1: '李四', 2: '王五', 3: '赵六'},'年龄': {0: 18, 1: 19, 2: 20, 3: 21},'成绩': {0: 85, 1: 90, 2: 78, 3: 92}}
# 导入data2.xlsx,其中数据表在Sheet2中,赋值给df2
df2 = pd.read_excel('data2.xlsx', sheet_name='Sheet2')
df2
# 发现df2的表头有问题,需要跳过3行,直接读取数据,并将60列表头设置为:
# ['A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7', 'A8', 'A9', 'A10', 
# 'A11', 'A12', 'A13', 'A14', 'A15', 'A16', 'A17', 'A18', 'A19', 'A20',
# 'A21', 'A22', 'A23', 'A24', 'A25', 'A26', 'A27', 'A28', 'A29', 'A30',
# 'A31', 'A32', 'A33', 'A34', 'A35', 'A36', 'A37', 'A38', 'A39', 'A40',
# 'A41', 'A42', 'A43', 'A44', 'A45', 'A46', 'A47', 'A48', 'A49', 'A50',
# 'A51', 'A52', 'A53', 'A54', 'A55', 'A56', 'A57', 'A58', 'A59', 'A60']
df2 = pd.read_excel('data2.xlsx', sheet_name='Sheet2', skiprows=3, header=0)
df2.columns = ['A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7', 'A8', 'A9', 'A10','A11', 'A12', 'A13', 'A14', 'A15', 'A16', 'A17', 'A18', 'A19', 'A20','A21', 'A22', 'A23', 'A24', 'A25', 'A26', 'A27', 'A28', 'A29', 'A30','A31', 'A32', 'A33', 'A34', 'A35', 'A36', 'A37', 'A38', 'A39', 'A40','A41', 'A42', 'A43', 'A44', 'A45', 'A46', 'A47', 'A48', 'A49', 'A50','A51', 'A52', 'A53', 'A54', 'A55', 'A56', 'A57', 'A58', 'A59', 'A60']
df2.head(1)
# 将A16列和A22列按照日期的格式读入,或者转化为日期格式
df2['A16'] = pd.to_datetime(df2['A16'])
df2['A22'] = pd.to_datetime(df2['A22'], format='%Y%m%d')
print(df2['A16'])
print(df2['A22'])
0       2016-01-01
1       2016-01-01
2       2022-10-26
3       2017-01-01
4       2016-01-01...    
23946   2016-01-01
23947   2020-07-01
23948   2018-01-01
23949   2016-01-01
23950   2016-01-01
Name: A16, Length: 23951, dtype: datetime64[ns]
0       2023-12-17
1       2023-12-17
2       2023-12-17
3       2023-12-17
4       2023-12-17...    
23946   2023-12-17
23947   2023-12-17
23948   2023-12-17
23949   2023-12-17
23950   2023-12-17
Name: A22, Length: 23951, dtype: datetime64[ns]
from sklearn import datasets
# 读取iris数据集,赋值给iris
iris_ = datasets.load_iris()
iris = pd.DataFrame(data=iris_.data, columns=iris_.feature_names)
iris['target'] = iris_.target
iris
sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)target
05.13.51.40.20
14.93.01.40.20
24.73.21.30.20
34.63.11.50.20
45.03.61.40.20
..................
1456.73.05.22.32
1466.32.55.01.92
1476.53.05.22.02
1486.23.45.42.32
1495.93.05.11.82

150 rows × 5 columns

# 查看iris的基本信息
iris.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):#   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  0   sepal length (cm)  150 non-null    float641   sepal width (cm)   150 non-null    float642   petal length (cm)  150 non-null    float643   petal width (cm)   150 non-null    float644   target             150 non-null    int32  
dtypes: float64(4), int32(1)
memory usage: 5.4 KB
# 查看iris的描述性统计
iris.describe()
sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)target
count150.000000150.000000150.000000150.000000150.000000
mean5.8433333.0573333.7580001.1993331.000000
std0.8280660.4358661.7652980.7622380.819232
min4.3000002.0000001.0000000.1000000.000000
25%5.1000002.8000001.6000000.3000000.000000
50%5.8000003.0000004.3500001.3000001.000000
75%6.4000003.3000005.1000001.8000002.000000
max7.9000004.4000006.9000002.5000002.000000
# 利用describe()的数据,示iris的四分位数
iris.describe().loc[['25%', '50%', '75%'], :]
sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)target
25%5.12.81.600.30.0
50%5.83.04.351.31.0
75%6.43.35.101.82.0
# 也可以用直接制作表格,展示iris的四分位数
iris.quantile([0.25, 0.5, 0.75])
sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)target
0.255.12.81.600.30.0
0.505.83.04.351.31.0
0.756.43.35.101.82.0
# 检查iris中是否有缺失值
iris.isnull().sum()
sepal length (cm)    0
sepal width (cm)     0
petal length (cm)    0
petal width (cm)     0
target               0
dtype: int64
# 读取iris数据集,赋值给iris
iris_ = datasets.load_iris()
iris = pd.DataFrame(data=iris_.data, columns=iris_.feature_names)
iris['target'] = iris_.target
# 从前四列分别随机挑选5个数字将其替换为缺失值
import numpy as np
# 从前四列分别随机挑选5个数字将其替换为缺失值  
for column in iris.columns[:4]:  # 只处理前四列  # 随机选择2个索引  random_indices = np.random.choice(iris.index, 2, replace=False)  # 将选中的值替换为NaN  iris.loc[random_indices, column] = np.nan 
# 查看这些有缺失值的行
iris[iris.isnull().any(axis=1)]
sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)target
75.03.41.5NaN0
8NaN2.91.40.20
395.13.4NaN0.20
716.12.84.0NaN1
756.6NaN4.41.41
866.73.1NaN1.51
1156.4NaN5.32.32
137NaN3.15.51.82
# 每个包含确实值的行,也同样对应了一个target值,请用target值相同的行的均值填充缺失值
iris = iris.fillna(iris.groupby('target').transform('median')) 
# 检验
iris[iris.isnull().any(axis=1)]
sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)target
# 查看target的每个值出现的次数
iris['target'].value_counts()
target
0    50
1    50
2    50
Name: count, dtype: int64
# 删除第一行数据后,查看target的每个值出现的次数
iris.drop(0, axis=0, inplace=True)
iris['target'].value_counts()
target
1    50
2    50
0    49
Name: count, dtype: int64
# 将列名改为中文
# ['花萼长度', '花萼宽度', '花瓣长度', '花瓣宽度', '类别']
iris.columns = ['花萼长度', '花萼宽度', '花瓣长度', '花瓣宽度', '类别']
iris
花萼长度花萼宽度花瓣长度花瓣宽度类别
14.93.01.40.20
24.73.21.30.20
34.63.11.50.20
45.03.61.40.20
55.43.91.70.40
..................
1456.73.05.22.32
1466.32.55.01.92
1476.53.05.22.02
1486.23.45.42.32
1495.93.05.11.82

149 rows × 5 columns

# 计算花萼长度和花萼宽度的相关系数
iris['花萼长度'].corr(iris['花萼宽度'])
-0.11924860426893708
# 过滤出花萼长度大于7的数据
iris[iris['花萼长度'] > 7]
花萼长度花萼宽度花瓣长度花瓣宽度类别
1027.13.05.92.12
1057.63.06.62.12
1077.32.96.31.82
1097.23.66.12.52
1177.73.86.72.22
1187.72.66.92.32
1227.72.86.72.02
1257.23.26.01.82
1297.23.05.81.62
1307.42.86.11.92
1317.93.86.42.02
1357.73.06.12.32
# 切片出花萼长度大于7且花萼宽度小于3的数据
iris[(iris['花萼长度'] > 7) & (iris['花萼宽度'] < 3)]
花萼长度花萼宽度花瓣长度花瓣宽度类别
1077.32.96.31.82
1187.72.66.92.32
1227.72.86.72.02
1307.42.86.11.92
# 切片第三行到第五行的数据
iris.iloc[2:5, :]
花萼长度花萼宽度花瓣长度花瓣宽度类别
34.63.11.50.20
45.03.61.40.20
55.43.91.70.40
# 切片第三行到第五行的花萼长度和花萼宽度数据
iris.iloc[2:5, [0, 1]]
花萼长度花萼宽度
34.63.1
45.03.6
55.43.9
# 过滤类别为2的数据的前4行
iris[iris['类别'] == 2].head(4)
花萼长度花萼宽度花瓣长度花瓣宽度类别
1006.33.36.02.52
1015.82.75.11.92
1027.13.05.92.12
1036.32.95.61.82
# 过滤类别为1的数据的第3-5行数据
iris[iris['类别'] == 1].iloc[2:5, :]
花萼长度花萼宽度花瓣长度花瓣宽度类别
526.93.14.91.51
535.52.34.01.31
546.52.84.61.51
# 过滤出花萼长度大于7的数据,并按照花瓣长度降序排列
iris[iris['花萼长度'] > 7].sort_values(by='花瓣长度', ascending=False)
花萼长度花萼宽度花瓣长度花瓣宽度类别
1187.72.66.92.32
1177.73.86.72.22
1227.72.86.72.02
1057.63.06.62.12
1317.93.86.42.02
1077.32.96.31.82
1097.23.66.12.52
1307.42.86.11.92
1357.73.06.12.32
1257.23.26.01.82
1027.13.05.92.12
1297.23.05.81.62
# 将iris按照花萼长度降序排列,重置索引
iris.sort_values(by='花萼长度', ascending=False).reset_index(drop=True)
花萼长度花萼宽度花瓣长度花瓣宽度类别
07.93.86.42.02
17.73.06.12.32
27.72.86.72.02
37.73.86.72.22
47.72.66.92.32
..................
1444.63.61.00.20
1454.52.31.30.30
1464.43.01.30.20
1474.43.21.30.20
1484.33.01.10.10

149 rows × 5 columns

# 创建适合进行groupby操作的DataFrame
df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],'C': [1, 3, 2, 5, 4, 1, 2, 3],'D': [2, 4, 5, 5, 1, 2, 4, 4],'E': [1, 2, 3, 4, 5, 6, 7, 8],'F': [2, 3, 4, 1, 2, 3, 4, 4]})
df
ABCDEF
0fooone1212
1barone3423
2footwo2534
3barthree5541
4footwo4152
5bartwo1263
6fooone2474
7foothree3484
# 按照A列和C列进行分组,计算D列的均值
df.groupby(['A', 'C'])['D'].mean()
A    C
bar  1    2.03    4.05    5.0
foo  1    2.02    4.53    4.04    1.0
Name: D, dtype: float64
# 按照A列分组,使用聚合函数获取C列的均值、最大值、最小值、标准差和方差
df.groupby('A')['C'].agg(['mean', 'max', 'min', 'std', 'var'])
meanmaxminstdvar
A
bar3.0512.0000004.0
foo2.4411.1401751.3
# 重命名聚合函数的列名{'mean': '均值', 'max': '最大值', 'min': '最小值', 'std': '标准差', 'var': '方差'}
df.groupby('A')['C'].agg(['mean', 'max', 'min', 'std', 'var']) \.rename(columns={'mean': '均值', 'max': '最大值', 'min': '最小值', 'std': '标准差', 'var': '方差'})
均值最大值最小值标准差方差
A
bar3.0512.0000004.0
foo2.4411.1401751.3
# 按照A列分组,对C列进行求和,并计算D列的均值
df.groupby('A').agg({'C': 'sum', 'D': 'mean'})
CD
A
bar93.666667
foo123.200000
# 按照A列分组,使用apply对C列应用函数,值乘以2
df.groupby(['A'])['C'].apply(lambda x: x * 2)
A     
bar  1     63    105     2
foo  0     22     44     86     47     6
Name: C, dtype: int64
# 将B列的值对应修改为整数
df['B'] = df['B'].map({'one': 1, 'two': 2, 'three': 3})
df
ABCDEF
0foo11212
1bar13423
2foo22534
3bar35541
4foo24152
5bar21263
6foo12474
7foo33484
# 在A列中每个foo后面加一个o,在bar前面加一个r
df['A'] = df['A'].str.replace('foo', 'fooo').str.replace('bar', 'rbar')
df
ABCDEF
0fooo11212
1rbar13423
2fooo22534
3rbar35541
4fooo24152
5rbar21263
6fooo12474
7fooo33484
# 创建两个df 用来练习 merge
# df1为用户编号、用户姓名、计费类型
df1 = pd.DataFrame({'id': ['001', '002', '003', '004'],'name': ['Alice', 'Bob', 'Charlie', 'David'],'type': ['A', 'B', 'A', 'B']})
# df2为用户编号、用户电话、电费
df2 = pd.DataFrame({'id': ['001', '002', '003', '004', '005'],'phone': ['123456789', '987654321', '111111111', '222222222', '333333333'],'bill': [100, 200, 300, 400, 500]})
print(df1)
print(df2)
    id     name type
0  001    Alice    A
1  002      Bob    B
2  003  Charlie    A
3  004    David    Bid      phone  bill
0  001  123456789   100
1  002  987654321   200
2  003  111111111   300
3  004  222222222   400
4  005  333333333   500
# 将df1和df2按照id列进行内连接
df1.merge(df2, on='id', how='inner')
idnametypephonebill
0001AliceA123456789100
1002BobB987654321200
2003CharlieA111111111300
3004DavidB222222222400
# 将df1和df2按照id列进行左连接
df1.merge(df2, on='id', how='left')
idnametypephonebill
0001AliceA123456789100
1002BobB987654321200
2003CharlieA111111111300
3004DavidB222222222400
# 将df1和df2按照id列进行右连接
df1.merge(df2, on='id', how='right')
idnametypephonebill
0001AliceA123456789100
1002BobB987654321200
2003CharlieA111111111300
3004DavidB222222222400
4005NaNNaN333333333500
# 将df1和df2按照id列进行外连接
df1.merge(df2, on='id', how='outer')
idnametypephonebill
0001AliceA123456789100
1002BobB987654321200
2003CharlieA111111111300
3004DavidB222222222400
4005NaNNaN333333333500
# 使用concat函数将df1和df2纵向拼接
pd.concat([df1, df2], axis=0)
idnametypephonebill
0001AliceANaNNaN
1002BobBNaNNaN
2003CharlieANaNNaN
3004DavidBNaNNaN
0001NaNNaN123456789100.0
1002NaNNaN987654321200.0
2003NaNNaN111111111300.0
3004NaNNaN222222222400.0
4005NaNNaN333333333500.0
# 使用concat函数将df1和df2横向拼接
pd.concat([df1, df2], axis=1)
idnametypeidphonebill
0001AliceA001123456789100
1002BobB002987654321200
2003CharlieA003111111111300
3004DavidB004222222222400
4NaNNaNNaN005333333333500
# 创建一个日期范围,从2021年1月1日开始,生成10个日期
pd.date_range(start='2021-01-01', periods=10)
DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04','2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08','2021-01-09', '2021-01-10'],dtype='datetime64[ns]', freq='D')
# 读取data3.csv文件
df3 = pd.read_csv('data3.csv')
df3
DateOpenHighLowCloseVolumeAdj Close
02014-07-0896.2796.8093.9295.356513000095.35
12014-07-0794.1495.9994.1095.975630540095.97
22014-07-0393.6794.1093.2094.032289180094.03
32014-07-0293.8794.0693.0993.482842090093.48
42014-07-0193.5294.0793.1393.523817020093.52
........................
84601980-12-1826.6326.7526.6326.63183624000.41
84611980-12-1725.8726.0025.8725.87216104000.40
84621980-12-1625.3725.3725.2525.25264320000.39
84631980-12-1527.3827.3827.2527.25439712000.42
84641980-12-1228.7528.8728.7528.751172584000.45

8465 rows × 7 columns

# 查看data3.csv文件格列的数据类型
df3.dtypes
Date          object
Open         float64
High         float64
Low          float64
Close        float64
Volume         int64
Adj Close    float64
dtype: object
# 将data3.csv文件中的'Date'列列转换为日期类型
df3['Date'] = pd.to_datetime(df3['Date'])
df3
DateOpenHighLowCloseVolumeAdj Close
02014-07-0896.2796.8093.9295.356513000095.35
12014-07-0794.1495.9994.1095.975630540095.97
22014-07-0393.6794.1093.2094.032289180094.03
32014-07-0293.8794.0693.0993.482842090093.48
42014-07-0193.5294.0793.1393.523817020093.52
........................
84601980-12-1826.6326.7526.6326.63183624000.41
84611980-12-1725.8726.0025.8725.87216104000.40
84621980-12-1625.3725.3725.2525.25264320000.39
84631980-12-1527.3827.3827.2527.25439712000.42
84641980-12-1228.7528.8728.7528.751172584000.45

8465 rows × 7 columns

# 将Date列设置为索引
df3.set_index('Date', inplace=True)
df3
OpenHighLowCloseVolumeAdj Close
Date
2014-07-0896.2796.8093.9295.356513000095.35
2014-07-0794.1495.9994.1095.975630540095.97
2014-07-0393.6794.1093.2094.032289180094.03
2014-07-0293.8794.0693.0993.482842090093.48
2014-07-0193.5294.0793.1393.523817020093.52
.....................
1980-12-1826.6326.7526.6326.63183624000.41
1980-12-1725.8726.0025.8725.87216104000.40
1980-12-1625.3725.3725.2525.25264320000.39
1980-12-1527.3827.3827.2527.25439712000.42
1980-12-1228.7528.8728.7528.751172584000.45

8465 rows × 6 columns

# 对时间序列数据重采样为按月的数据,并计算每个月的其他列的和
df3.resample('M').sum()
C:\Users\ksufe\AppData\Local\Temp\ipykernel_10340\3032457932.py:2: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.df3.resample('M').sum()
OpenHighLowCloseVolumeAdj Close
Date
1980-12-31396.26397.38395.76395.763362128006.15
1981-01-31666.85668.36664.75664.7515224720010.37
1981-02-28503.12504.87501.75501.75804048007.81
1981-03-31548.63550.37546.40546.401751792008.53
1981-04-30573.02574.73571.77571.771342320008.89
.....................
2014-03-3111205.4611265.5311131.4911197.5012504247001590.75
2014-04-3011341.7211431.3311261.5111362.5616087652001614.21
2014-05-3112627.3412733.8312564.9912667.1114339171001807.23
2014-06-304669.564705.774635.454675.8212065563001929.60
2014-07-31471.47475.02467.44472.35210918300472.35

404 rows × 6 columns

# 对时间序列数据重采样为按年统计的数据,并计算每年的其他列的平均值
df3.resample('Y').mean()
C:\Users\ksufe\AppData\Local\Temp\ipykernel_10340\2278679952.py:2: FutureWarning: 'Y' is deprecated and will be removed in a future version, please use 'YE' instead.df3.resample('Y').mean()
OpenHighLowCloseVolumeAdj Close
Date
1980-12-3130.48153830.56769230.44307730.4430772.586252e+070.473077
1981-12-3124.38634924.47186524.31115124.3111518.131889e+060.378651
1982-12-3119.13972319.41268818.95703619.1427272.111167e+070.298261
1983-12-3137.52484138.37607136.66984137.5219844.134987e+070.584643
1984-12-3126.86996027.39375526.35158126.8018974.148126e+070.417787
1985-12-3120.37881420.59517820.12865620.1949414.495383e+070.314862
1986-12-3132.38739132.93849831.85367632.4606725.269093e+070.505494
1987-12-3153.82268855.03644352.69458553.8895265.906256e+071.215652
1988-12-3141.55588942.18636440.89035641.5388934.080334e+071.305771
1989-12-3141.61500042.30023840.97861141.6585715.050181e+071.322341
1990-12-3137.50201638.21948636.81723337.5619374.387544e+071.205257
1991-12-3152.45154253.42553451.50687752.4945455.666764e+071.702292
1992-12-3154.80366155.60204753.96500054.8028354.049007e+071.792795
1993-12-3141.06324141.77830040.28478341.0266015.578353e+071.354664
1994-12-3134.05222234.71154833.41289734.0803175.670228e+071.142738
1995-12-3140.62305641.26702439.90841340.5404767.367712e+071.375556
1996-12-3125.04811025.42137824.50480324.9194095.235652e+070.850709
1997-12-3118.03237218.36023717.62897217.9664037.111004e+070.613123
1998-12-3130.51238131.26555629.77662730.5646031.142800e+081.043413
1999-12-3157.65948459.09988156.30099257.7702781.360146e+081.972063
2000-12-3171.86388974.19123069.60956371.7492061.193468e+083.120873
2001-12-3120.16532320.76629019.62237920.2193559.542117e+071.380484
2002-12-3119.12805619.52206318.71627019.1394447.640271e+071.306825
2003-12-3118.52178618.84349218.20698418.5447627.066493e+071.265992
2004-12-3135.42146836.02944434.92464335.5269441.208350e+082.425635
2005-12-3152.34968353.11123051.58821452.4017461.809534e+086.373651
2006-12-3170.98761071.93912469.81035970.8106372.148396e+089.668964
2007-12-31128.389084130.070478126.184502128.2739042.460119e+0817.515219
2008-12-31142.313755145.110672138.857708141.9790122.825901e+0819.386324
2009-12-31146.619087148.495675144.964881146.8141271.421168e+0820.047063
2010-12-31259.957619262.368810256.847619259.8424601.498263e+0835.479802
2011-12-31364.061429367.423571360.297698364.0043251.230747e+0849.703135
2012-12-31576.652720581.825400569.921160576.0497201.319642e+0878.847600
2013-12-31473.128135477.638929468.247103472.6348811.016087e+0865.994563
2014-12-31477.553256481.363721474.229922478.0365897.265242e+0780.837674
# 重置索引
df3.reset_index(inplace=True)
df3
DateOpenHighLowCloseVolumeAdj Close
02014-07-0896.2796.8093.9295.356513000095.35
12014-07-0794.1495.9994.1095.975630540095.97
22014-07-0393.6794.1093.2094.032289180094.03
32014-07-0293.8794.0693.0993.482842090093.48
42014-07-0193.5294.0793.1393.523817020093.52
........................
84601980-12-1826.6326.7526.6326.63183624000.41
84611980-12-1725.8726.0025.8725.87216104000.40
84621980-12-1625.3725.3725.2525.25264320000.39
84631980-12-1527.3827.3827.2527.25439712000.42
84641980-12-1228.7528.8728.7528.751172584000.45

8465 rows × 7 columns

# 生成年、月、日、星期的列
df3['year'] = df3['Date'].dt.year
df3['month'] = df3['Date'].dt.month
df3['day'] = df3['Date'].dt.day
df3['weekday'] = df3['Date'].dt.weekday
df3
DateOpenHighLowCloseVolumeAdj Closeyearmonthdayweekday
02014-07-0896.2796.8093.9295.356513000095.352014781
12014-07-0794.1495.9994.1095.975630540095.972014770
22014-07-0393.6794.1093.2094.032289180094.032014733
32014-07-0293.8794.0693.0993.482842090093.482014722
42014-07-0193.5294.0793.1393.523817020093.522014711
....................................
84601980-12-1826.6326.7526.6326.63183624000.41198012183
84611980-12-1725.8726.0025.8725.87216104000.40198012172
84621980-12-1625.3725.3725.2525.25264320000.39198012161
84631980-12-1527.3827.3827.2527.25439712000.42198012150
84641980-12-1228.7528.8728.7528.751172584000.45198012124

8465 rows × 11 columns

# 生成中华人民共和国成立当天的日期
new_china = pd.to_datetime('1949-10-01')
new_china
Timestamp('1949-10-01 00:00:00')
# 提取new_china的年份、月份、日期、星期
new_china.year, new_china.month, new_china.day, new_china.weekday()
(1949, 10, 1, 5)
# 检查当天是否为工作日
new_china.weekday() < 5
False


http://www.ppmy.cn/devtools/117720.html

相关文章

Python办公自动化案例:将Excel数据批量保存到Word表格中

案例:将excel数据批量保存到Word表格中 要将Excel数据批量保存到Word表格中,可以使用Python的openpyxl库来读取Excel文件,以及python-docx库来创建和编辑Word文档。以下是一段示例代码,以及代码解释和一些注意事项。 准备好的Excel数据: 1.安装所需库 首先,确保你已经…

JAVA集成Jasypt进行加密、解密(SpringBoot)

JAVA (SpringBoot) 集成 Jasypt 进行加密、解密 - 详细教程 在开发过程中&#xff0c;我们经常需要处理敏感数据&#xff0c;如数据库密码、API 密钥等。为了确保这些数据的安全性&#xff0c;我们可以使用加密技术来保护它们不被泄露。Jasypt&#xff08;Java Simplified Enc…

【C高级】有关shell脚本的一些练习

目录 1、写一个shell脚本&#xff0c;将以下内容放到脚本中&#xff1a; 2、写一个脚本&#xff0c;包含以下内容&#xff1a; 1、写一个shell脚本&#xff0c;将以下内容放到脚本中&#xff1a; 1、在家目录下创建目录文件&#xff0c;dir 2、dir下创建dir1和dir2 …

使用PyTorch检测和验证多GPU环境的Python脚本

使用PyTorch检测和验证多GPU环境的Python脚本 在深度学习和机器学习中&#xff0c;GPU的计算能力对模型训练和推理的速度有着极大的影响。随着多GPU系统的普及&#xff0c;如何确保多GPU能被正确识别并使用&#xff0c;是一个非常关键的问题。本文将为大家介绍一段简洁的Pytho…

c++9月26日

1.什么是虚函数,什么是纯虚函数 >虚函数:在父类的成员函数前加virtual关键字,在子类中重写该函数,这个函数就是虚函数 注意:如果子类没有对这个虚函数重写,在调用时会调用父类的函数实现 >纯虚函数:在父类成员函数前加virtual关键字(该函数不需要实现具体功能),在函数声明…

微信小程序拨打电话点取消报错“errMsg“:“makePhoneCall:fail cancel“

问题&#xff1a;微信小程序中拨打电话点取消&#xff0c;控制台报错"errMsg":"makePhoneCall:fail cancel" 解决方法&#xff1a;在后面加上catch就可以解决这个报错 wx.makePhoneCall({phoneNumber: 181********}).catch((e) > {console.log(e) //用…

HTML中的列表、表格、媒体元素和内联框架

HTML中的列表、表格、媒体元素和内联框架 本章目标 会使用有序列表、无序列表和定义列表实现数据展示&#xff08;重点&#xff09;会使用表格实现数据会使用媒体元素在网页中播放视频&#xff08;重点&#xff09;会使用html5结构元素进行网页布局&#xff08;重点&#xff…

嵌入式开发中学习C++的用处?

这个问题一直有同学在问&#xff0c;其实从我的角度是一定是需要学的&#xff0c;最直接的就是你面试大厂的嵌入式岗位或者相关岗位&#xff0c;最后一定会问c&#xff0c;而很多人是不会的&#xff0c;这就是最大的用处&#xff0c;至于从技术角度考量倒是其次&#xff0c;因为…