【数据挖掘与商务智能决策】第九章 随机森林模型

news/2025/2/12 8:01:11/

9.1.3 随机森林模型的代码实现

和决策树模型一样,随机森林模型既可以做分类分析,也可以做回归分析。

分别对应的模型为随机森林分类模型(RandomForestClassifier)及随机森林回归模型(RandomForestRegressor)。随机森林分类模型的基模型是分类决策树模型(详见5.1.2节),随机森林回归模型的基模型则是回归决策树模型(详见5.1.3节)。

# 随机森林分类模型简单代码演示如下所示:
from sklearn.ensemble import RandomForestClassifier
X = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
y = [0, 0, 0, 1, 1]model = RandomForestClassifier(n_estimators=10, random_state=123)
model.fit(X, y)print(model.predict([[5, 5]]))
[0]
# 随机森林回归模型简单代码演示如下所示:
from sklearn.ensemble import RandomForestRegressor
X = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
y = [1, 2, 3, 4, 5]model = RandomForestRegressor(n_estimators=10, random_state=123)
model.fit(X, y)print(model.predict([[5, 5]]))
[2.8]

9.2 量化金融 - 股票数据获取

9.2.1 股票基本数据获取

这里介绍一个免费的财经数据Python接口包:Tushare库,通过它我们能够免费地调用历史行情数据来进行分析。其官方地址为:http://tushare.org/
如果是想查看股价行情数据,可以访问相应网址:http://tushare.org/trading.html

1.Tushare库的基本介绍

推荐通过PIP安装法来安装Tushare库,以Windows系统为例,具体方法是:通过Win + R组合键调出运行框,输入cmd后回车,然后在弹出框中输入pip install tushare后按一下Enter回车键的方法来进行安装。如果在1.2.3节讲到的Jupyter Notebook编辑器中安装的话,只需要在代码框中输入!pip instll tushare(注意是英文格式下的!)然后运行该行代码框即可。

(1) 获得日线行情数据

import tushare as ts
df = ts.get_hist_data('000002', start='2018-01-01', end='2019-01-31')
df.head()
本接口即将停止更新,请尽快使用Pro版接口:https://tushare.pro/document/2
openhighcloselowvolumeprice_changep_changema5ma10ma20v_ma5v_ma10v_ma20turnover
date

注意,如果不写开始及结束日期,直接写ts.get_hist_data(‘000002’)会默认调取从当天往前3年的数据。此外,上面代码也可以简写成:

df = ts.get_hist_data('000002','2018-01-01', '2019-01-31')
df.head()
openhighcloselowvolumeprice_changep_changema5ma10ma20v_ma5v_ma10v_ma20
date
2019-01-3127.3928.1527.7527.00411857.590.541.9926.80026.15325.641426579.02351523.31320269.20
2019-01-3026.7027.8227.2126.63592303.190.331.2326.33225.87525.457391193.72334927.14310794.00
2019-01-2925.9126.8826.8825.87368071.620.823.1525.95225.69625.292302102.48302443.43293529.36
2019-01-2826.2026.6226.0625.86308906.56-0.04-0.1525.65625.52425.139304355.52302512.15291266.32
2019-01-2525.5126.3526.1025.49451756.160.692.7125.57425.42025.008293674.18289949.63293446.08

补充知识点:get_k_data()函数

因为get_hist_data()函数不仅获得了股票的基本价格信息,还获取了价格变化、均线价格等衍生变量,所以它最多也只能调取当天往前3年的数据,如果想调取超过3年的日线级别数据,得用ts.get_k_data()函数,它只获取股价的基本数据,代码如下:

df = ts.get_k_data('000002', start='2000-01-01', end='2019-01-31')
df.head()
dateopenclosehighlowvolumecode
02000-01-040.5840.6140.6200.57245747.08000002
12000-01-050.6170.5990.6230.59646136.73000002
22000-01-060.5960.6270.6320.58771920.31000002
32000-01-070.6310.6550.6560.624136349.36000002
42000-01-100.6730.7210.7210.665142424.86000002

通过get_k_data()函数获取的数据没有像get_hist_data()函数那样将日期默认设为行索引,这里的日期还是作为一个普通的列(date列),如果想把这里的date列转为行索引,可以使用设置索引的set_index()函数,代码如下:

df = df.set_index('date')  # 或者写成:df.set_index('date', inplace=True)
df.head()
openclosehighlowvolumecode
date
2000-01-040.5840.6140.6200.57245747.08000002
2000-01-050.6170.5990.6230.59646136.73000002
2000-01-060.5960.6270.6320.58771920.31000002
2000-01-070.6310.6550.6560.624136349.36000002
2000-01-100.6730.7210.7210.665142424.86000002

(2) 获得分钟级别的数据

通过设置ktype参数可以获得分钟级别的数据,代码如下:

df = ts.get_hist_data('000002', ktype='5')
df.head()
openhighcloselowvolumeprice_changep_changema5ma10ma20v_ma5v_ma10v_ma20turnover
date
2020-01-03 15:00:0032.0632.0732.0632.053920.320.000.0032.12232.11332.035015322.717669.513041.00.00
2020-01-03 14:55:0032.1132.1132.0732.038377.52-0.04-0.1232.13632.10332.029019359.317817.513428.90.01
2020-01-03 14:50:0032.2032.2132.1232.1113402.00-0.08-0.2532.15432.09332.017523136.317962.013959.70.01
2020-01-03 14:45:0032.1632.2132.2032.1224470.900.040.1232.16032.07832.005024442.317137.913903.30.03
2020-01-03 14:40:0032.1332.1832.1632.1326443.000.030.0932.13232.05631.988023976.315128.113491.10.03

(3) 获得实时行情数据

通过如下代码可以实时取得股票当前报价和成交信息:

df = ts.get_realtime_quotes('000002') 
df
nameopenpre_closepricehighlowbidaskvolumeamount...a2_pa3_va3_pa4_va4_pa5_va5_pdatetimecode
0万 科A32.71032.56032.05032.81031.78032.04032.050805536292584309903.290...32.060300532.07011932.08034432.0902020-01-0315:00:03000002

1 rows × 33 columns

其运行结果就是当时的股价信息,如果收盘后运行的话获得的就是当日收盘价相关信息。如果觉得列数过多,可以通过DataFrame选取列的方法选取相应的列,代码如下:

df = df[['code','name','price','bid','ask','volume','amount','time']]
df
codenamepricebidaskvolumeamounttime
0000002万 科A32.05032.04032.050805536292584309903.29015:00:03

如果想同时获得多个股票代码的实时数据,可以用如下代码:

df = ts.get_realtime_quotes(['000002','000980','000981'])
df
nameopenpre_closepricehighlowbidaskvolumeamount...a2_pa3_va3_pa4_va4_pa5_va5_pdatetimecode
0万 科A32.71032.56032.05032.81031.78032.04032.050805536292584309903.290...32.060300532.07011932.08034432.0902020-01-0315:00:03000002
1众泰汽车3.0103.0003.0203.0402.9703.0103.0203249507497566972.190...3.03048493.04038403.05028113.0602020-01-0315:00:03000980
2ST银亿1.8701.8901.8101.9201.8001.8101.8204051867074744476.400...1.83029391.84041631.85014491.8602020-01-0315:00:03000981

3 rows × 33 columns

(4) 获得分笔数据

通过如下代码可以获得历史分笔数据,分笔数据也即每笔成交的信息:

df = ts.get_tick_data('000002', date='2018-12-12', src='tt')
df.head()
D:\Anaconda\Anaconda\lib\site-packages\tushare\stock\trading.py:182: FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\t'.skiprows=[0])
timepricechangevolumeamounttype
009:25:0426.310.34607715988903卖盘
109:30:0026.330.02197518651买盘
209:30:0426.330.00462312173863卖盘
309:30:0626.340.013911030134买盘
409:30:0926.350.0132898664911买盘

(5) 获得指数信息

通过如下代码可以获得上证指数等指数信息:

df = ts.get_index()
df.head()  # 目前的tushare获得的指数的列名有点错乱-2020-01-04备注
codenamechangeopenprecloseclosehighlowvolumeamount
100上证指数3089.02200.333085.19763083.78583093.81923074.51780.02.899917e+110.0
200A股指数3236.70770.333232.68923231.18853241.74363221.49060.02.899041e+110.0
300B股指数261.05100.00261.1236261.7619261.7619260.24290.08.764934e+070.0
800综合指数3006.02950.392999.17443006.53183018.16992998.42660.06.499701e+100.0
90上证3804885.02670.234881.72354879.54714890.88384858.43250.05.888844e+100.0

9.2.2 股票衍生变量生成

1.生成股票基本数据

这里首先通过上一节的get_k_data()函数获取从2015-01-01到2019-12-31的股票基本数据:

df = ts.get_k_data('000002',start='2015-01-01',end='2019-12-31')
df.head()
dateopenclosehighlowvolumecode
02015-01-0512.43612.88513.21412.2896560835.0000002
12015-01-0612.61712.41012.95412.1423346346.0000002
22015-01-0712.32412.29812.53112.0992642051.0000002
32015-01-0812.37511.74512.41911.6322639394.0000002
42015-01-0911.70111.62412.28911.4853294584.0000002
# 通过set_index()函数可以将日期列设置为行索引:
df = df.set_index('date')
df.head()
openclosehighlowvolumecode
date
2015-01-0512.43612.88513.21412.2896560835.0000002
2015-01-0612.61712.41012.95412.1423346346.0000002
2015-01-0712.32412.29812.53112.0992642051.0000002
2015-01-0812.37511.74512.41911.6322639394.0000002
2015-01-0911.70111.62412.28911.4853294584.0000002

2.简单衍生变量的计算

通过如下代码我们可以先构造一些简单的衍生变量:

df['close-open'] = (df['close'] - df['open'])/df['open']
df['high-low'] = (df['high'] - df['low'])/df['low']df['pre_close'] = df['close'].shift(1)  # 该列所有往下移一行形成昨日收盘价
df['price_change'] = df['close']-df['pre_close']
df['p_change'] = (df['close']-df['pre_close'])/df['pre_close']*100df.head()
openclosehighlowvolumecodeclose-openhigh-lowpre_closeprice_changep_change
date
2015-01-0512.43612.88513.21412.2896560835.00000020.0361050.075271NaNNaNNaN
2015-01-0612.61712.41012.95412.1423346346.0000002-0.0164060.06687512.885-0.475-3.686457
2015-01-0712.32412.29812.53112.0992642051.0000002-0.0021100.03570512.410-0.112-0.902498
2015-01-0812.37511.74512.41911.6322639394.0000002-0.0509090.06765812.298-0.553-4.496666
2015-01-0911.70111.62412.28911.4853294584.0000002-0.0065810.07000411.745-0.121-1.030226

3.移动平均线指标MA值

通过如下代码可以获得股价的5日移动平均值和10日移动平均值:

df['MA5'] = df['close'].rolling(5).mean()
df['MA10'] = df['close'].rolling(10).mean()df.head(15)  # head(15)表示展示前15行,因为要展示10行以上,才能看到MA10有值
openclosehighlowvolumecodeclose-openhigh-lowpre_closeprice_changep_changeMA5MA10
date
2015-01-0512.43612.88513.21412.2896560835.00000020.0361050.075271NaNNaNNaNNaNNaN
2015-01-0612.61712.41012.95412.1423346346.0000002-0.0164060.06687512.885-0.475-3.686457NaNNaN
2015-01-0712.32412.29812.53112.0992642051.0000002-0.0021100.03570512.410-0.112-0.902498NaNNaN
2015-01-0812.37511.74512.41911.6322639394.0000002-0.0509090.06765812.298-0.553-4.496666NaNNaN
2015-01-0911.70111.62412.28911.4853294584.0000002-0.0065810.07000411.745-0.121-1.03022612.1924NaN
2015-01-1211.51111.33811.51111.0192436341.0000002-0.0150290.04465011.624-0.286-2.46042711.8830NaN
2015-01-1311.27811.29511.56311.2091664610.00000020.0015070.03158211.338-0.043-0.37925611.6600NaN
2015-01-1411.29511.32111.49411.1221646818.00000020.0023020.03344711.2950.0260.23019011.4646NaN
2015-01-1511.34711.90011.95211.2352429686.00000020.0487350.06381811.3210.5795.11438911.4956NaN
2015-01-1611.90011.68411.90011.5722129475.0000002-0.0181510.02834411.900-0.216-1.81512611.507611.8500
2015-01-1910.80310.51711.14810.5173603625.0000002-0.0264740.05999811.684-1.167-9.98801811.343411.6132
2015-01-2010.54310.67310.88910.4222914688.00000020.0123300.04480910.5170.1561.48331311.219011.4395
2015-01-2110.65611.27811.40710.4573555294.00000020.0583710.09084810.6730.6055.66850911.210411.3375
2015-01-2211.25211.73611.79611.1663224727.00000020.0430150.05642111.2780.4584.06100411.177611.3366
2015-01-2311.72712.03012.17711.4943310408.00000020.0258380.05942211.7360.2942.50511211.246811.3772
# 删除空值
df.dropna(inplace=True)  # 删除空值行,也可以写成df = df.dropna()
df.head()
openclosehighlowvolumecodeclose-openhigh-lowpre_closeprice_changep_changeMA5MA10
date
2015-01-1611.90011.68411.90011.5722129475.0000002-0.0181510.02834411.900-0.216-1.81512611.507611.8500
2015-01-1910.80310.51711.14810.5173603625.0000002-0.0264740.05999811.684-1.167-9.98801811.343411.6132
2015-01-2010.54310.67310.88910.4222914688.00000020.0123300.04480910.5170.1561.48331311.219011.4395
2015-01-2110.65611.27811.40710.4573555294.00000020.0583710.09084810.6730.6055.66850911.210411.3375
2015-01-2211.25211.73611.79611.1663224727.00000020.0430150.05642111.2780.4584.06100411.177611.3366

4.股票衍生变量生成库:TA-Lib库的安装

下面要讲的衍生变量指标都是通过股票衍生变量生成库:TA-Lib库生成的,所以这里我们先讲解一下如何安装Ta-Lib库:

以Windows操作系统为例,如果你的系统是Windows的64位系统,直接使用pip install talib语句会报错,原因在于python pip源中TA-Lib是32位的,不能安装在64位系统平台上。

正确的方法是下载64位的安装包后本地安装,下载推荐使用加州大学的python扩展库,地址:https://www.lfd.uci.edu/~gohlke/pythonlibs/

进入网址后Ctrl + F键搜索“ta_lib”,如下图所示,
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-FpvDHWnj-1681175692727)( https://uploader.shimo.im/f/rd7iXLJw6RMZPkbV.png!thumbnail)]

选择对应的文件TA_Lib-0.4.17-cp37-cp37m-win_amd64.whl(cp后的37表示的是Python3.7版本)下载到自己选择的文件夹,读者在下载时也要根据自己Python的版本进行下载。

如何查看自己Python的版本,可以通过Win + R键调出运行框,然后输入cmd,在弹出界面中输入python,然后按一下Enter回车键即可查看相关版本,如下图所示:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-6KXw9yAF-1681175692728)( https://uploader.shimo.im/f/90luFuZqHt46OZko.png)]

下载完成后,在自己选择的文件夹中(例如笔者保存在的文件夹“E:\机器学习与大数据分析\随机森林”),如下图所示,在搜索框中输入cmd后按一下Enter回车键搜索:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Wp2cI7Zm-1681175692728)( https://uploader.shimo.im/f/EnabNoMQKT0tYdaz.png!thumbnail)]

在弹出框中输入如下内容,然后Enter回车键安装即可。

pip install TA_Lib-0.4.17-cp37-cp37m-win_amd64.whl

5.通过TA-Lib库生成相对强弱指标RSI值

import talib
df['RSI'] = talib.RSI(df['close'], timeperiod=12)

6.通过TA-Lib库生成动量指标MOM值

df['MOM'] = talib.MOM(df['close'], timeperiod=5)

7.通过TA-Lib库生成指数移动平均值EMA

df['EMA12'] = talib.EMA(df['close'], timeperiod=12)  # 12日指数移动平均线
df['EMA26'] = talib.EMA(df['close'], timeperiod=26)  # 26日指数移动平均线

8.通过TA-Lib库生成异同移动平均线MACD值

df['MACD'], df['MACDsignal'], df['MACDhist'] = talib.MACD(df['close'], fastperiod=12, slowperiod=26, signalperiod=9) 
df.dropna(inplace=True)  # 删除空行
df.tail()  # 和head()相对,通过tail()函数可以查看后五行
openclosehighlowvolumecodeclose-openhigh-lowpre_closeprice_changep_changeMA5MA10RSIMOMEMA12EMA26MACDMACDsignalMACDhist
date
2019-12-2530.4030.2930.6330.18685037.0000002-0.0036180.01491130.38-0.09-0.29624830.87830.07563.075563-0.0229.90855628.9732110.9353450.7729580.162387
2019-12-2630.5031.1231.3030.50888790.00000020.0203280.02623030.290.832.74017830.89630.38768.8901640.0930.09493229.1322330.9626990.8109060.151793
2019-12-2731.2331.0031.3230.81703096.0000002-0.0073650.01655331.12-0.12-0.38560430.76030.67267.220611-0.6830.23417329.2705860.9635870.8414420.122145
2019-12-3031.3531.5731.7931.02915751.00000020.0070180.02482331.000.571.83871030.87230.88470.8778140.5630.43968529.4409130.9987720.8729080.125864
2019-12-3131.3532.1832.4531.32663497.00000020.0264750.03607931.570.611.93221431.23231.05774.2339511.8030.70742629.6438081.0636180.9110500.152567

补充内容:Talib库的一些验证

RSI指标的验证

import pandas as pd
import talibdata = pd.DataFrame()
data['close'] = [10, 12, 11, 13, 12, 14, 13]
data['RSI'] = talib.RSI(data['close'], timeperiod=6)data
closeRSI
010NaN
112NaN
211NaN
313NaN
412NaN
514NaN
61366.666667

9.3 量化金融 - 股票涨跌预测模型搭建

9.3.1 多因子模型搭建

1.引入之后需要用到的库

import tushare as ts  # 股票基本数据相关库
import numpy as np  # 科学计算相关库
import pandas as pd  # 科学计算相关库  
import talib  # 股票衍生变量数据相关库
import matplotlib.pyplot as plt  # 引入绘图相关库
from sklearn.ensemble import RandomForestClassifier  # 引入分类决策树模型
from sklearn.metrics import accuracy_score  # 引入准确度评分函数
import warnings
warnings.filterwarnings("ignore") # 忽略警告信息,警告非报错,不影响代码执行

2.股票数据处理与衍生变量生成

我们这里将8.2节股票基本数据和股票衍生变量数据的相关代码汇总,方便之后的股票涨跌预测模型的搭建:

# 1.股票基本数据获取
df = ts.get_k_data('000002',start='2015-01-01',end='2019-12-31')
df = df.set_index('date')  # 设置日期为索引# 2.简单衍生变量构造
df['close-open'] = (df['close'] - df['open'])/df['open']
df['high-low'] = (df['high'] - df['low'])/df['low']df['pre_close'] = df['close'].shift(1)  # 该列所有往下移一行形成昨日收盘价
df['price_change'] = df['close']-df['pre_close']
df['p_change'] = (df['close']-df['pre_close'])/df['pre_close']*100# 3.移动平均线相关数据构造
df['MA5'] = df['close'].rolling(5).mean()
df['MA10'] = df['close'].rolling(10).mean()
df.dropna(inplace=True)  # 删除空值# 4.通过Ta_lib库构造衍生变量
df['RSI'] = talib.RSI(df['close'], timeperiod=12)  # 相对强弱指标
df['MOM'] = talib.MOM(df['close'], timeperiod=5)  # 动量指标
df['EMA12'] = talib.EMA(df['close'], timeperiod=12)  # 12日指数移动平均线
df['EMA26'] = talib.EMA(df['close'], timeperiod=26)  # 26日指数移动平均线
df['MACD'], df['MACDsignal'], df['MACDhist'] = talib.MACD(df['close'], fastperiod=12, slowperiod=26, signalperiod=9)  # MACD值
df.dropna(inplace=True)  # 删除空值
本接口即将停止更新,请尽快使用Pro版接口:https://tushare.pro/document/2
# 查看此时的df后五行
df.tail()
openclosehighlowvolumecodeclose-openhigh-lowpre_closeprice_changep_changeMA5MA10RSIMOMEMA12EMA26MACDMACDsignalMACDhist
date
2019-12-2527.16527.05527.39526.945685037.0000002-0.0040490.01670127.145-0.09-0.33155327.64326.84063.081344-0.0226.67355525.7371030.9364520.7745850.161867
2019-12-2627.26527.88528.06527.265888790.00000020.0227400.02934227.0550.833.06782527.66127.15268.8952910.0926.85993225.8962070.9637250.8124130.151311
2019-12-2727.99527.76528.08527.575703096.0000002-0.0082160.01849527.885-0.12-0.43033927.52527.43767.225542-0.6826.99917326.0346360.9645370.8428380.121699
2019-12-3028.11528.33528.55527.785915751.00000020.0078250.02771327.7650.572.05294427.63727.64970.8823350.5627.20468526.2050330.9996510.8742010.125451
2019-12-3128.11528.94529.21528.085663497.00000020.0295220.04023528.3350.612.15281527.99727.82274.2380641.8027.47242626.4079941.0644320.9122470.152185

3.特征变量和目标变量提取

X = df[['close', 'volume', 'close-open', 'MA5', 'MA10', 'high-low', 'RSI', 'MOM', 'EMA12', 'MACD', 'MACDsignal', 'MACDhist']]
y = np.where(df['price_change'].shift(-1)> 0, 1, -1)

首先强调最核心的一点:应该是今天的股价信息预测下一天的股价涨跌情况,所以y应该是下一天的股价变化情况。

其中Numpy库中的where()函数的使用方法如下所示:
np.where(判断条件,满足条件的赋值,不满足条件的赋值)

其中df[‘price_change’].shift(-1)则是利用shift()函数将price_change(股价变化)这一列往上移动一行,这样就获得了每一行对应的下一天股价涨跌情况。

因此这里的判断条件就是下一天股价是否大于0,如果下一天股价涨了的我们则y赋值为数字1,下一天股价跌了的,则y赋值为数字-1。这个下一天的股价涨跌情况就是我们根据当天股票基本数据以及衍生变量预测的内容。

3.训练集和测试集数据划分

接下来,我们要将原始数据集进行分割,我们要注意到一点,训练集与测试集的划分要按照时间序列划分,而不是像之前利用train_test_split()函数进行划分。原因在于股票价格的变化趋势具有时间性,如果我们随机划分,则会破坏时间性特征,因为我们是根据当天数据来预测下一天的股价涨跌情况,而不是任意一天的股票数据来预测下一天的股价涨跌情况。
因此,我们将前90%的数据作为训练集,后10%的数据作为测试集,代码如下:

X_length = X.shape[0]  # shape属性获取X的行数和列数,shape[0]即表示行数 
split = int(X_length * 0.9)X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]

4.模型搭建

model = RandomForestClassifier(max_depth=3, n_estimators=10, min_samples_leaf=10, random_state=1)
model.fit(X_train, y_train)
RandomForestClassifier(max_depth=3, min_samples_leaf=10, n_estimators=10,
                   random_state=1)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-1" type="checkbox" checked><label for="sk-estimator-id-1" class="sk-toggleable__label sk-toggleable__label-arrow">RandomForestClassifier</label><div class="sk-toggleable__content"><pre>RandomForestClassifier(max_depth=3, min_samples_leaf=10, n_estimators=10,random_state=1)</pre></div></div></div></div></div>

9.3.2 模型使用与评估

1.预测下一天的涨跌情况

y_pred = model.predict(X_test)
print(y_pred)
[-1  1 -1  1  1  1  1  1  1  1  1  1  1  1  1  1 -1  1  1  1  1  1  1  11  1  1  1  1  1 -1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  11  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  11  1  1  1  1  1 -1 -1 -1  1  1  1  1  1  1  1  1  1  1  1  1 -1 -1 -1-1 -1 -1 -1 -1 -1 -1 -1 -1]
a = pd.DataFrame()  # 创建一个空DataFrame 
a['预测值'] = list(y_pred)
a['实际值'] = list(y_test)
a.head()
预测值实际值
0-1-1
11-1
2-1-1
31-1
411
# 查看预测概率
y_pred_proba = model.predict_proba(X_test)
y_pred_proba[0:5]
array([[0.53462409, 0.46537591],[0.49852513, 0.50147487],[0.53687766, 0.46312234],[0.49733765, 0.50266235],[0.49733765, 0.50266235]])

2.模型准确度评估

from sklearn.metrics import accuracy_score
score = accuracy_score(y_pred, y_test)
print(score)
0.5428571428571428
# 此外,我们还可以通过模型自带的score()函数记性打分,代码如下:
model.score(X_test, y_test)
0.5428571428571428

3.分析数据特征的重要性

model.feature_importances_
array([0.15132672, 0.09957677, 0.05021545, 0.06514831, 0.079073  ,0.11447561, 0.04576496, 0.17559964, 0.04713332, 0.07061667,0.08866083, 0.01240873])
# 通过如下代码可以更好的展示特征及其特征重要性:
features = X.columns  
importances = model.feature_importances_
a = pd.DataFrame()
a['特征'] = features
a['特征重要性'] = importances
a = a.sort_values('特征重要性', ascending=False)
a
特征特征重要性
7MOM0.175600
0close0.151327
5high-low0.114476
1volume0.099577
10MACDsignal0.088661
4MA100.079073
9MACD0.070617
3MA50.065148
2close-open0.050215
8EMA120.047133
6RSI0.045765
11MACDhist0.012409

9.3.3 参数调优

from sklearn.model_selection import GridSearchCV  # 网格搜索合适的超参数
# 指定分类器中参数的范围
parameters = {'n_estimators':[5, 10, 20], 'max_depth':[2, 3, 4, 5], 'min_samples_leaf':[5, 10, 20, 30]}
new_model = RandomForestClassifier(random_state=1)  # 构建分类器
grid_search = GridSearchCV(new_model, parameters, cv=6, scoring='accuracy')  # cv=6表示交叉验证6次,scoring='roc_auc'表示以ROC曲线的AUC评分作为模型评价准则, 默认为'accuracy', 即按准确度评分
grid_search.fit(X_train, y_train)  # 传入数据
grid_search.best_params_  # 输出参数的最优值
{'max_depth': 2, 'min_samples_leaf': 20, 'n_estimators': 10}

9.3.4 收益回测曲线绘制

X_test['prediction'] = model.predict(X_test)
X_test['p_change'] = (X_test['close'] - X_test['close'].shift(1)) / X_test['close'].shift(1)X_test['origin'] = (X_test['p_change'] + 1).cumprod()
X_test['strategy'] = (X_test['prediction'].shift(1) * X_test['p_change'] + 1).cumprod()X_test[['strategy', 'origin']].tail()
strategyorigin
date
2019-12-251.2484841.059319
2019-12-261.2101831.091817
2019-12-271.2153911.087118
2019-12-301.1904391.109436
2019-12-311.1648111.133320
# 通过如下代码将收益情况删除空值后可视化,并设置X轴刻度自动倾斜:
X_test[['strategy', 'origin']].dropna().plot()
plt.gcf().autofmt_xdate()
plt.show()


在这里插入图片描述



http://www.ppmy.cn/news/44041.html

相关文章

Files的常用方法都有哪些?

文章目录Files的常用方法都有哪些&#xff1f;Files方法方法说明Files. exists()检测文件路径是否存在。Files. createFile()创建文件。Files. createDirectory()创建文件夹。Files. delete()删除一个文件或目录。Files. copy()复制文件。Files. move()移动文件。Files. size()…

UE4读取本地XML文件

关键词&#xff1a;UE4 UE5 Unreal Engine XML 文件 txt 需求&#xff1a; 游戏开发中需要读取了写入配置文件&#xff0c;需要保存场景信息&#xff0c;道具位置旋转信息&#xff0c;那么将其保存为XML是一个不错的办法。 涉及知识点&#xff1a; 怎样读取xml文件 思路 …

Keil工程中的C语言的基础学习(持续更新)

一、运算符的学习使用 1.1 移位运算符 左移运算符和右移运算符 使用对象&#xff1a;一般为进制数&#xff08;10进制&#xff0c;2进制&#xff0c;16进制等..&#xff09; a 0x01; #define b (a<<1) //将变量a 16进制转换位2进制&#xff0c;得到01&#xff1b; /…

C语言——变参函数

一、定义 一般函数的参数列表是固定的&#xff0c;所以在调用时传入的实参的个数和格式必须和实参匹配&#xff1b;在函数式中&#xff0c;不需要关心实参&#xff0c;直接调用形参即可。 变参函数&#xff0c;就是参数的个数及类型都不确定的函数&#xff0c;常见变参函数如pr…

在unreal中的基于波叠加的波浪水面材质原理和制作

关于水的渲染模型 如何渲染出真实的水体和模拟&#xff0c;是图形学&#xff0c;游戏开发乃至仿真领域很有意思的一件事 记得小时候玩《Command & Conquer: Red Alert 3》&#xff0c;被当时的水面效果深深震撼&#xff0c;作为一款2008年出的游戏&#xff0c;现在想起它…

Avue dynamic表单实现form单选,修改及新增项

Avue dynamic表单实现form单选&#xff0c;修改及新增项 AvueDialogFormTableViewOption.js /** Description:银行账号* Version: 1.0* Autor: Tj* Date: 2023-03-21 11:02:42*/ export const BankAccountOption (vueObj, formData) > {return {labelWidth: 100, //整体列…

都抢发AI大模型,谁关注模型安全?

如果要给4月定一个主题&#xff0c;“大模型”应该当仁不让。 从4月7日阿里突然放出“通义千问”内测开始&#xff1b;8日&#xff0c;华为放出盘古大模型&#xff1b;10日&#xff0c;商汤推出类ChatGPT产品“商量SenseChat”&#xff1b;之后&#xff0c;11日的阿里云峰会&am…

笔记本电脑开不了机?3种解决方法

案例&#xff1a;笔记本电脑开不了机怎么办&#xff1f; 【我的笔记本电脑一直用得好好的&#xff0c;今天突然开不了机&#xff0c;尝试按了开机键很多次也没有解决。有人遇到过同样的问题吗&#xff1f;有没有解决的方法&#xff01;】 在日常生活中&#xff0c;我们经常会…