原创文章第240篇,专注“个人成长与财富自由、世界运作的逻辑与投资"。
今天做排序学习算法在ETF行业轮动上的策略,我们选用的DBDT框架是lightGBM,它的优点就是快且效果不错。
我们的候选集是29个行业ETF:
etfs = ['159870.SZ','512400.SH','515220.SH','515210.SH','516950.SH','562800.SH','515170.SH','512690.SH','159996.SZ','159865.SZ','159766.SZ','515950.SH','159992.SZ','159839.SZ','512170.SH','159883.SZ','512980.SH','159869.SZ','515050.SH','515000.SH','515880.SH','512480.SH','515230.SH','512670.SH','515790.SH','159757.SZ','516110.SH','512800.SH','512200.SH', ]
我们使用的alpha数据集:
这里的因子列表可以再扩展的,可以替换成qlib的alpha158,或者是补充更多的技术指标。
class Alpha:def __init__(self):passdef get_feature_config(self):return self.parse_config_to_fields()def get_label_config(self):return ["shift(close, -5)/shift(open, -1) - 1", "qcut(shift(close, -5)/shift(open, -1) - 1,20)"], ["label_c", 'label']@staticmethoddef parse_config_to_fields():# ['CORD30', 'STD30', 'CORR5', 'RESI10', 'CORD60', 'STD5', 'LOW0',# 'WVMA30', 'RESI5', 'ROC5', 'KSFT', 'STD20', 'RSV5', 'STD60', 'KLEN']fields = []names = []windows = [5, 10, 20, 30, 60]fields += ["corr(close/shift(close,1), log(volume/shift(volume, 1)+1), %d)" % d for d in windows]names += ["CORD%d" % d for d in windows]fields += ['close/shift(close,20)-1']names += ['roc_20']return fields, names
下面是梯度提升树的代码:
from quant.datafeed.dataset import DataSet import joblib, osclass LGBModel:def __init__(self, load_model=False, feature_cols=None):self.feature_cols = feature_colsif load_model:path = os.path.dirname(__file__)self.ranker = joblib.load(path + '/lgb.pkl')def _prepare_groups(self, df):df['day'] = df.indexgroup = df.groupby('day')['day'].count()return groupdef predict(self, data):data = data.copy(deep=True)if self.feature_cols:data = data[self.feature_cols]pred = self.ranker.predict(data)return preddef train(self, ds: DataSet):X_train, X_test, y_train, y_test = ds.get_split_data()X_train_data = X_train.drop('symbol', axis=1)X_test_data = X_test.drop('symbol', axis=1)# X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=1)query_train = self._prepare_groups(X_train.copy(deep=True)).valuesquery_val = self._prepare_groups(X_test.copy(deep=True)).valuesquery_test = [X_test.shape[0]]import lightgbm as lgbgbm = lgb.LGBMRanker()gbm.fit(X_train_data, y_train, group=query_train,eval_set=[(X_test_data, y_test)], eval_group=[query_val],eval_at=[5, 10, 20], early_stopping_rounds=50)print(gbm.feature_importances_)joblib.dump(gbm, 'lgb.pkl')
模型训练好了之后,我们使用algo的机器学习“模块”来加载,预测:
from quant.context import ExecContextclass ModelPredict:def __init__(self, model):self.model = modeldef __call__(self, context: ExecContext):context.bar_df['pred_score'] = self.model.predict(context.bar_df)return False # 需要继续处理
整合的代码依然简洁:
env = Env(ds)from quant.context import ExecContext from quant.algo.algos import * from quant.algo.algo_model import ModelPredict from quant.models.gbdt_l2r import LGBModelmodel = LGBModel(load_model=True, feature_cols=ds.features)env.set_algos([RunWeekly(),ModelPredict(model=model),SelectTopK(K=2, order_by='pred_score', b_ascending=False),WeightEqually() ])env.backtest_loop() env.show_results()
年化33.8%,夏普1.22。(整合回测框架代码,数据,策略均上传到星球,请大家前往下载我的开源项目及知识星球)
作为对比,如果我们把pred_score按最差的买入。
结果是如下这样,也侧面印证我们排序的正确性。
明天把alpha因子集扩充,然后加入前向滚动回测看效果。
闲庭独坐对闲花, 轻煮时光慢煮茶, 不问人间烟火事, 任凭岁月染霜华。
财富自由的生活,值得期待。