逾期风控算法比赛

news/2025/1/11 14:55:31/

逾期风控算法比赛

文章目录

    • **逾期风控算法比赛**
      • 1.比赛及数据概况
      • 2.衍生变量
        • 2.1 变量衍生思路
          • 2.1.1 payment与order表
          • 2.1.2 payment与iot表
          • 2.1.3 payment与payment表
          • 2.1.4 原生变量
        • 2.2 特征选取结果
      • 3.模型训练
      • 4.模型效果
      • 5.困难点
        • 1. 如何获取还款日期前特定日期的贷款次数、逾期预警次数及地理位置变化数、每设备数
        • 2. 如何在有限的空间内获取每台设备在特定时间段内的地理位置变化数
        • 3.如何正确获取还款日前特定时间内预期预警次数
        • 4.如何获取每个人在还款日前特定时间内的地理位置变化数

1.比赛及数据概况

  1. 比赛任务:预测某客户群体中的每位客户未来三个月是否会逾期(overdue),即 Y(1:是,0:否)。以f1_score衡量预测效果。提供数据样本及预测任务如下图。
    在这里插入图片描述
  2. 比赛提供三张表:客户基本信息order表,还款记录payment表,以及其贷款购买的设备工况信息iot表。详细情况如下图。
    在这里插入图片描述

其中,payment表中有44854条数据,有42433条数据Y值非空。

2.衍生变量

备注:1.贷款次数特定时间段指的是还款日期前7天、半个月、一个月、三个月、六个月、九个月、一年;
2.预期预警、地理位置变化、每个人设备数、每一设备多少人用的特定时间段指的是还款日期前3天、7天、15天、30天、90、180、270、360天。

2.1 变量衍生思路

2.1.1 payment与order表
  1. 每个客户、每台设备在还款日前特定时间段内发生贷款的次数。用0补充均值
    customer_id_D3_num
  2. 每个客户在还款日前特定时间段内发生贷款的次数。用0补充均值
    num_of_posting_before_payment_7days
  3. 每个客户在还款日前特定时间段内用了多少台不一样的设备贷款。用0补充均值
    customer_id_D3_orgnum
  4. 每台设备在还款日前特定时间段内有多少个不一样的用户在使用。用0补充均值
    device_code_D3_orgnum
2.1.2 payment与iot表
  1. 每台设备在还款日前特定时间段内发生地理位置变化的个数(无变化为1)。地理位置变化数表用1补充均值,与总表合并后用-99补充均值。
    num_of_latitude_int_before_15day
  2. 每个人在还款日前特定时间段内发生地理位置变化的个数(多台设备地理位置变化次数的加总)。与总表合并后用-99补充均值。
    num_of_longitude_int_before_7dayper_person
2.1.3 payment与payment表
  1. 每个客户在还款日前特定时间段内发生逾期预警的个数。
    num_of_notify_before_7day
  2. 还款期数/总期数(QC/RZQS)
  3. 还款日期、提取出年份和月份
    SSMONTH,ssmonth_year,ssmonth_month
  4. 是否第一期还款出现逾期
    if_first_post_and_notify
2.1.4 原生变量
  1. iot表
    (1) 将latitude,longitude取整,用每个设备的均值补空值,用全体均值补充空值
    (2) work_sum_time按设备、日期排序,用临近值补空值,用全体均值补充空值
  2. payment表
    (1) 代理商编号
    (2) 融资期数,期次
    (3) 是否预警

2.2 特征选取结果

'device_code', 'customer_id','SSMONTH', 'DLSBH_ENCODER', 'RZQS','QC', 'notified', 'num_of_posting_before_payment_7days','num_of_posting_before_payment_15days', 'num_of_posting_before_payment_30days', 'num_of_posting_before_payment_90days','num_of_posting_before_payment_180days','num_of_posting_before_payment_270days','num_of_posting_before_payment_365days','num_of_posting_before_payment_all_day', 'work_sum_time','num_of_longitude_int_before_7day', 'num_of_latitude_int_before_7day','num_of_longitude_int_before_15day', 'num_of_latitude_int_before_15day','num_of_longitude_int_before_30day', 'num_of_latitude_int_before_30day','num_of_longitude_int_before_90day', 'num_of_latitude_int_before_90day','num_of_longitude_int_before_180day','num_of_latitude_int_before_180day','num_of_longitude_int_before_270day','num_of_latitude_int_before_270day','num_of_longitude_int_before_365day','num_of_latitude_int_before_365day', 'latitude_int', 'longitude_int','if_first_post_and_notify', 'QC_RZQS', 'customer_id_D3_num','customer_id_D3_orgnum', 'customer_id_D7_num', 'customer_id_D7_orgnum','customer_id_D15_num', 'customer_id_D15_orgnum', 'customer_id_D30_num','customer_id_D30_orgnum', 'customer_id_D90_num','customer_id_D90_orgnum', 'customer_id_D180_num','customer_id_D180_orgnum', 'customer_id_D270_num','customer_id_D270_orgnum', 'customer_id_D360_num','customer_id_D360_orgnum', 'device_code_D3_num','device_code_D3_orgnum', 'device_code_D7_num', 'device_code_D7_orgnum','device_code_D15_num', 'device_code_D15_orgnum', 'device_code_D30_num','device_code_D30_orgnum', 'device_code_D90_num','device_code_D90_orgnum', 'device_code_D180_num','device_code_D180_orgnum', 'device_code_D270_num','device_code_D270_orgnum', 'device_code_D360_num','device_code_D360_orgnum', 'num_of_notify_before_7day','num_of_notify_before_15day', 'num_of_notify_before_30day','num_of_notify_before_90day', 'num_of_notify_before_180day','num_of_notify_before_270day', 'num_of_notify_before_365day','num_of_notify_before_all_day','num_of_longitude_int_before_7dayper_person','num_of_latitude_int_before_7dayper_person','num_of_longitude_int_before_15dayper_person','num_of_latitude_int_before_15dayper_person','num_of_longitude_int_before_30dayper_person','num_of_latitude_int_before_30dayper_person','num_of_longitude_int_before_90dayper_person','num_of_latitude_int_before_90dayper_person','num_of_longitude_int_before_180dayper_person','num_of_latitude_int_before_180dayper_person','num_of_longitude_int_before_270dayper_person','num_of_latitude_int_before_270dayper_person','num_of_longitude_int_before_365dayper_person','num_of_latitude_int_before_365dayper_person'

3.模型训练

lgbm=lightgbm.train(train_set=lightgbm.Dataset(data_x, label=data_y), num_boost_round=500, params={"objective": "binary", "learning_rate": 0.01, "max_depth": 5, "num_leaves": 128, "verbose": -1, "bagging_fraction": 0.8, "feature_fraction": 0.9,'lambda_l1':0.4,'lambda_l2':0.5})
joblib.dump(lgbm,'/home/workspace/output/process_dir/lgbm_0614.model')
lgbm=joblib.load(os.path.join(path,"lgbm_0614.model")) payment_1=payment.copy()
payment_1["预测得分"] = lgbm.predict(data_x)
payment_1 = payment_1.sort_values("预测得分", ascending=False, ignore_index=True)
print('预测得分计算出并赋值!')
payment_1['Y']=0
payment_1.loc[:int(0.15 * len(payment_1)), ["Y"]] = 1
print('Y已生成!')

4.模型效果

  1. 验证集验证结果为0.56
  2. 训练集训练结果为0.58
  3. 线上提交得分为0.50

公榜提交结果如下图
在这里插入图片描述

私榜提交结果如下图
在这里插入图片描述

5.困难点

1. 如何获取还款日期前特定日期的贷款次数、逾期预警次数及地理位置变化数、每设备数

#获取每个还款日期前每个设备、每个人贷款次数;每个人用多少台设备;每台设备多少人用
df_iot=pd.read_csv('/home/workspace/output/iot_a.csv')
df_order=pd.read_csv('/home/workspace/output/orders_a.csv')
df_payment=pd.read_csv('/home/workspace/output/payment_a.csv')df_order_1=df_order.copy()
df_order_1['posting_date'] = pd.to_datetime(df_order_1['posting_date'])
df_payment_1=df_payment[['device_code','customer_id','SSMONTH']].copy()
df_payment_1['SSMONTH_date'] = df_payment_1['SSMONTH'].astype('str').apply(lambda x: x+'31' if x[-2:] in['01','03','05','07','08','10',12']else x+'28' if x[-2:] == '02' else x+'30')
df_payment_1['SSMONTH_date'] = pd.to_datetime(df_payment_1['SSMONTH_date'])day_lst = [3,7,15,30,90,180,270,360]
day_name = ['D3', 'D7', 'D15', 'D30', 'D90', 'D180', 'D270', 'D360']
day_dic = dict(zip(day_lst,day_name))
# 拼接左表为还款记录
df_left = df_payment_1[['device_code','customer_id','SSMONTH','SSMONTH_date']].copy()
# 右表为过账记录表
df_right  = df_order_1.copy()
# 还款记录的3个主键用于接收变量
dt = df_payment_1[['device_code','customer_id','SSMONTH','SSMONTH_date']].copy()# 从customer_id维度拼接
df_merge = pd.merge(df_left,df_right,on = 'customer_id')
df_merge.rename(columns = {'device_code_x':'device_code'},inplace = True)
# 日期相减
df_merge['time_delta'] =df_merge['SSMONTH_date']  -  df_merge['posting_date']for i in day_lst:df_time = df_merge.loc[(df_merge['time_delta'] <= timedelta(days = i))&(df_merge['time_delta'] >= timedelta(days = 0))]# customer计数dt_tmp = df_time.groupby(['device_code','customer_id','SSMONTH_date']).size().reset_index()dt = pd.merge(dt,dt_tmp,on = ['device_code','customer_id','SSMONTH_date'],how = 'left')dt.rename(columns = {0:'customer_id'+'_'+day_dic[i]+'_num'},inplace = True)
#     dt# customer 对应的device的个数dt_tmp = df_time.groupby(['customer_id','SSMONTH_date'])['device_code_y'].nunique().reset_index()dt = pd.merge(dt,dt_tmp,on = ['customer_id','SSMONTH_date'],how = 'left')dt.rename(columns = {'device_code_y':'customer_id'+'_'+day_dic[i]+'_orgnum'},inplace = True)
dt.columns

2. 如何在有限的空间内获取每台设备在特定时间段内的地理位置变化数

# 生成在还款表中出现的设备列表 df_payment_iot_list
df_payment_6=df_payment[['device_code']].copy()
df_payment_6.drop_duplicates('device_code',inplace=True)
df_iot_2=df_iot_1[['device_code_iot']].copy()
df_iot_2.drop_duplicates('device_code_iot',inplace=True)
df_payment_6['if_exist']=1
df_payment_iot=pd.merge(df_payment_6,df_iot_2,\left_on='device_code',right_on='device_code_iot',how='right')
df_payment_iot=df_payment_iot.loc[df_payment_iot['if_exist'].notnull()]
df_payment_iot.drop(columns=['if_exist','device_code'],inplace=True)
df_payment_iot.drop_duplicates('device_code_iot',inplace=True)
df_payment_iot_list=df_payment_iot['device_code_iot'].tolist()begin_time=time.time()
month_list=[3,7,15,30,90,180,270,360]
location_list=['latitude_int','longitude_int']
for i in month_list:globals()['df_payment_iot_concat_'+str(i)]=pd.DataFrame()for h in range(0,len(df_payment_iot_list),10):df_payment_excess=df_payment.loc[df_payment['device_code'].isin(df_payment_iot_list[h:h+10])]df_payment_excess['SSMONTH_date']=df_payment_excess['SSMONTH'].astype('str').apply(lambda x:\x+'31' if x[-2:] in ['01','03','05','07','08','10','12']\else x+'28' \if x[-2:]=='02' else x+'30')df_payment_excess['SSMONTH_date']=pd.to_datetime(df_payment_excess['SSMONTH_date'])df_iot_excess=df_iot_1.loc[df_iot_1['device_code_iot'].isin(df_payment_iot_list[h:h+10])]df_iot_excess.columns=['device_code', 'work_sum_time', 'reporttime', 'latitude_int', 'longitude_int']df_payment_iot_excess=pd.merge(df_payment_excess[['device_code','SSMONTH_date','SSMONTH']],df_iot_excess\[['device_code','reporttime','latitude_int','longitude_int']],\on='device_code',how='left')df_payment_iot_excess['day_delta']=df_payment_iot_excess.apply(lambda x : (x['SSMONTH_date']-x['reporttime']).days,axis=1)df_payment_iot_excess['if_before_'+str(i)+'day' ]=df_payment_iot_excess.apply(lambda x:1 if (x['day_delta']<=i and x\['day_delta']>=0) else 0,axis=1)df_payment_iot_excess_merge_lat=df_payment_iot_excess.loc[df_payment_iot_excess['if_before_'+str(i)+'day']==1].\drop_duplicates(['latitude_int','device_code','SSMONTH'])df_payment_iot_excess_merge_lat=df_payment_iot_excess_merge_lat[['latitude_int','device_code','SSMONTH']].\groupby(['device_code', 'SSMONTH']).count()df_payment_iot_excess_merge_lat.columns=['num_of_latitude_int_before_'+str(i)+'day']df_payment_iot_excess_merge_long=df_payment_iot_excess.loc[df_payment_iot_excess['if_before_'+str(i)+'day']==1].\drop_duplicates(['longitude_int','device_code','SSMONTH'])df_payment_iot_excess_merge_long=df_payment_iot_excess_merge_long[['longitude_int','device_code','SSMONTH']].\groupby(['device_code', 'SSMONTH']).count()df_payment_iot_excess_merge_long.columns=['num_of_longitude_int_before_'+str(i)+'day']df_payment_iot_excess_merge=pd.merge(df_payment_iot_excess_merge_long,df_payment_iot_excess_merge_lat,\on=['device_code', 'SSMONTH'],how='inner')df_payment_iot_excess_merge.reset_index(inplace=True)globals()['df_payment_iot_concat_'+str(i)]=pd.concat([globals()['df_payment_iot_concat_'+str(i)],df_payment_iot_excess_merge])print(len(globals()['df_payment_iot_concat_'+str(i)]))print(globals()['df_payment_iot_concat_'+str(i)].info())print(globals()['df_payment_iot_concat_'+str(i)].shape)globals()['df_payment_iot_concat_'+str(i)].to_csv('/home/workspace/output/feature_derive/process_location_change/df_payment_iot_concat_%s_fillna.csv' \%(str(i)),index=False)
end_time=time.time()
print((end_time-begin_time)/3600)

3.如何正确获取还款日前特定时间内预期预警次数

df_payment_5为左表,df_payment_6为右表。左表保存customer_id,SSMONTH,右表保存customer_id,SSMONTH,notified。

# 获取每个人还款日期前逾期预警次数变量
df_payment_5=df_payment[['customer_id', 'SSMONTH']].copy()
df_payment_5.columns=['customer_id', 'SSMONTH_1']
df_payment_6=df_payment[['customer_id', 'SSMONTH','notified']].copy()
df_payment_5['SSMONTH_date']=df_payment_5['SSMONTH_1'].astype('str').apply(lambda x:x+'31' if x[-2:] in ['01','03','05','07','08','10','12'] \else x+'28'  if x[-2:]=='02' else x+'30')
df_payment_5['SSMONTH_date']=pd.to_datetime(df_payment_5['SSMONTH_date'])
df_payment_6['SSMONTH_date']=df_payment_6['SSMONTH'].astype('str').apply(lambda x:x+'31' if x[-2:] in ['01','03','05','07','08','10','12'] \else x+'28'  if x[-2:]=='02' else x+'30')
df_payment_6['SSMONTH_date']=pd.to_datetime(df_payment_6['SSMONTH_date'])
df_payment_5.rename(columns={'SSMONTH_date':'SSMONTH_date_1'},inplace=True)
df_payment_6.rename(columns={'SSMONTH_date':'SSMONTH_date_2'},inplace=True)
df_payment_3=pd.merge(df_payment_5[['customer_id','SSMONTH_1','SSMONTH_date_1']],df_payment_6[['customer_id','notified','SSMONTH_date_2']]\,on='customer_id',how='left')# 计算还款日期前7,15,30,90,180,270,365天、一直的预期预警个数
month_list=[7,15,30,90,180,270,365]
df_payment_3['day_delta']=df_payment_3.apply(lambda x : (x['SSMONTH_date_1']-x['SSMONTH_date_2']).days ,axis=1)
for i in month_list:df_payment_3['if_notify_before_'+str(i)+'day' ]=df_payment_3.apply(lambda x:1 if (x['day_delta']<=i and x['day_delta']>=0) else 0,axis=1)# 生成if_notifu_before*day变量,用于选取处在特定时间段的样本,用于计算该时间段内预警个数df_payment_3_merge=df_payment_3.loc[df_payment_3['if_notify_before_'+str(i)+'day']==1][['customer_id', 'SSMONTH_1', 'notified']]\.groupby(['customer_id', 'SSMONTH_1']).sum()df_payment_3_merge.columns=['num_of_notify_before_'+str(i)+'day']df_payment_3.drop(columns='if_notify_before_'+str(i)+'day',inplace=True)#将customer_id,SSMONTH_1作为连接键df_payment_3=pd.merge(df_payment_3,df_payment_3_merge,on=['customer_id', 'SSMONTH_1'],how='left')
df_payment_3['if_notify_before_all_day' ]=df_payment_3.apply(lambda x:1 if  x['day_delta']>=0 else 0,axis=1)
df_payment_3_merge=df_payment_3.loc[df_payment_3['if_notify_before_all_day']==1][['customer_id', 'SSMONTH_1', 'notified']]\
.groupby(['customer_id', 'SSMONTH_1']).sum()
df_payment_3_merge.columns=['num_of_notify_before_all_day']
df_payment_3.drop(columns='if_notify_before_all_day',inplace=True)
df_payment_3=pd.merge(df_payment_3,df_payment_3_merge,on=['customer_id', 'SSMONTH_1'],how='left')
print(df_payment_3.columns)
df_payment_3.head(20)df_payment_3.drop(columns=['SSMONTH_date_2','day_delta','notified'],inplace=True)
df_payment_3.drop_duplicates(['customer_id','SSMONTH_date_1'],inplace=True)
print(df_payment_3.columns)
print(df_payment_3.info())

4.如何获取每个人在还款日前特定时间内的地理位置变化数

df_left=df_payment[['device_code','customer_id','SSMONTH']].copy()df_right=df_payment_iot_concat.copy()df_tmp=pd.merge(df_left,df_right,on=['device_code','SSMONTH'],how='inner')df_tmp.drop(columns='device_code',inplace=True)df_payment_iot_concat_dropdup_per_person=(df_tmp.groupby(['customer_id','SSMONTH']).sum()-df_tmp.groupby(\['customer_id','SSMONTH']).\count()).reset_index()for i in df_payment_iot_concat_dropdup_per_person.columns[2:]:df_payment_iot_concat_dropdup_per_person.rename(columns={i:i+'per_person'},inplace=True)df_all=pd.merge(df_all,df_payment_iot_concat_dropdup_per_person,on=['customer_id','SSMONTH'],how='left')fill_col=['num_of_longitude_int_before_7dayper_person','num_of_latitude_int_before_7dayper_person','num_of_longitude_int_before_15dayper_person','num_of_latitude_int_before_15dayper_person','num_of_longitude_int_before_30dayper_person','num_of_latitude_int_before_30dayper_person','num_of_longitude_int_before_90dayper_person','num_of_latitude_int_before_90dayper_person','num_of_longitude_int_before_180dayper_person','num_of_latitude_int_before_180dayper_person','num_of_longitude_int_before_270dayper_person','num_of_latitude_int_before_270dayper_person','num_of_longitude_int_before_365dayper_person','num_of_latitude_int_before_365dayper_person']for i in fill_col:df_all[i].fillna(-99,inplace=True)

http://www.ppmy.cn/news/129904.html

相关文章

synchronized解析

synchronized解析 1. 用户态与内核态 JDK早期&#xff0c;synchronized 叫做重量级锁&#xff0c; 因为申请锁资源必须通过kernel, 系统调用 ;hello.asm ;write(int fd, const void *buffer, size_t nbytes)section datamsg db "Hello", 0xAlen equ $ - msgsectio…

synchronized

用户态与内核态 JDK早期&#xff0c;synchronized 叫做重量级锁&#xff0c; 因为申请锁资源必须通过kernel, 系统调用 ;hello.asm ;write(int fd, const void *buffer, size_t nbytes)section datamsg db "Hello", 0xAlen equ $ - msgsection .text global _start…

最新【2021.1.28】今日头条_signature 分析

最新【2021.1.28】头条_signature 分析 【温馨提示】:此文仅适用PC端web版本某日某条加密 最新今日头条sign加密更新了,看了看,比之前的坑增加了许多; 今日头条web版的请求主要参数是:_signature,已经取消了之前的as、cp参数; 话不多说,开整。。。 1.参数定位 这个…

从ARM镜像的三种方式看统一镜像构建

Java虚拟机所提出的“一处编译、处处运行”的机制使得Java得到了极大的发展&#xff0c;不依赖与底层的操作系统&#xff0c;在JVM层进行解耦&#xff0c;同一份Java字节码在不同平台的运行真正实现了跨操作系统的需求&#xff0c;从此Java应用基本上默认被认为跨操作系统可用的…

JNA 中 GetProcAddress(HMODULE hmodule, int ordinal) 的正确使用方式。LoadLibrary

随着对JNA的深入&#xff0c;需要用到GetProcAddress(HMODULE hmodule, int ordinal)方法&#xff0c;在C语言中第二个参数是一个字符串&#xff08;通过传入函数的名称得到函数回调地址&#xff09;&#xff0c;但是在JNA中却没提供传字符串的方法&#xff0c;个人看了一下源码…

为何电脑系统相对通用而手机却相对定制

相对而言&#xff0c;开发一款手机操作系统并不难&#xff0c;难在做生态&#xff0c;丰富多彩的软件库&#xff0c;才是系统得已广泛推广的基础啊。 有一篇博客有如下疑问&#xff1a; 其实&#xff0c;如果只是驱动屏幕/内存/触摸等&#xff0c;答案是确定的&#xff0c;所有…

Android C++层的内存回收机制

目录(?)[-] 关于C Layer的内存回收机制 RefBase类创建sp和wp对象关于promote用法 1关于C Layer的内存回收机制 Android C层的内存收回主要是通过三个类来实现&#xff0c;分别是RefBase&#xff0c;sp&#xff0c;wp; SP和WP是两个智能指针模板类&#xff0c;sp是strong poin…

fi sap 凭证冲销 稅_sap如何冲销凭证 SAP技术篇之一SAP凭证冲销总结 - 电脑常识 - 服务器之家...

sap如何冲销凭证 SAP技术篇之一SAP凭证冲销总结 发布时间:2017-05-21 来源:服务器之家 目录 SAP凭证冲销总结... 1 一. 凭证冲销介绍... 3 二. 财务模块(FI) 4 三. 物资模块(MM) 7 四. 销售模块(SD) 9 五. 成本核算模块(CO) 11 一.凭证冲销介绍 财务…