- 数据和目标
有如下数据存储了各组件的有效期间(此处起止日期用数字代替以便查阅),目标为将有重叠的期间合并到一起。
sql">SQL> SELECT * FROM demo_eff_periods;COMPONENT_ITEM_ID EFFECTIVITY_DATE DISABLE_DATE
----------------- ---------------- ------------1 1000 19991 1500 30001 3000 39991 4000 49992 1000 49992 2000 39993 1000 19993 1500 29993 1000 39994 1000 19994 2000 29994 3000 39995 1000 19995 1000 19995 4000 499915 rows selected
将数据用下图表示,除组件1的第二段区间为闭区间,其它均为左闭右开。
- 例程一:转化为连续日期合并
先使用递归语法把每段期间拆解出具体的每一天,再将连续的日期整合为新区间。
这里使用dense_rank是因为可能有重复的日期出现。
如果出现上一期间的结束日期与下一期间的开始日期刚好连续的情况,这个方法将会把这两个区间合并到一起。
sql">SQL> SELECT component_item_id,2 MIN(new_date) effectivity_date,3 MAX(new_date) disable_date4 FROM (SELECT dep.component_item_id,5 dep.effectivity_date + LEVEL - 1 new_date,6 dense_rank() over(PARTITION BY dep.component_item_id ORDER BY dep.effectivity_date + LEVEL) dr7 FROM demo_eff_periods dep8 CONNECT BY LEVEL + dep.effectivity_date - 1 <= dep.disable_date9 AND PRIOR dep.rowid = dep.rowid10 AND PRIOR sys_guid() IS NOT NULL)11 GROUP BY component_item_id, new_date - dr;COMPONENT_ITEM_ID EFFECTIVITY_DATE DISABLE_DATE
----------------- ---------------- ------------1 1000 49992 1000 49993 1000 39994 1000 39995 1000 19995 4000 49996 rows selected
- 例程二:递归串联区间合并
使用RSF语法,先找到所有不被其它区间包含的期间开始日期,再递归串联后续区间,最终得以合并。
将SQL中注释的部分解除,则可实现如例程一的进一步合并连续区间的效果。
sql">SQL> WITH rsf_eff_periods(component_item_id, effectivity_date, disable_date) AS2 (SELECT dep.component_item_id, dep.effectivity_date, dep.disable_date3 FROM demo_eff_periods dep4 WHERE NOT EXISTS5 (SELECT 16 FROM demo_eff_periods dep07 WHERE dep0.component_item_id = dep.component_item_id8 AND dep0.effectivity_date < dep.effectivity_date9 AND dep0.disable_date >= dep.effectivity_date /*- 1*/)10 UNION ALL11 SELECT rep.component_item_id, rep.effectivity_date, dep.disable_date12 FROM rsf_eff_periods rep, demo_eff_periods dep13 WHERE rep.component_item_id = dep.component_item_id14 AND rep.disable_date >= dep.effectivity_date /*- 1*/15 AND rep.disable_date < dep.disable_date)16 SELECT component_item_id, effectivity_date, MAX(disable_date) disable_date17 FROM rsf_eff_periods18 GROUP BY component_item_id, effectivity_date;COMPONENT_ITEM_ID EFFECTIVITY_DATE DISABLE_DATE
----------------- ---------------- ------------1 1000 39991 4000 49992 1000 49993 1000 39994 1000 19994 2000 29994 3000 39995 1000 19995 4000 49999 rows selected
CONNECT BY写法:
sql">SELECT component_item_id, effectivity_date, MAX(disable_date) disable_dateFROM (SELECT dep.component_item_id,connect_by_root effectivity_date effectivity_date,dep.disable_dateFROM demo_eff_periods depSTART WITH NOT EXISTS(SELECT 1FROM demo_eff_periods dep0WHERE dep0.component_item_id = dep.component_item_idAND dep0.effectivity_date < dep.effectivity_dateAND dep0.disable_date >= dep.effectivity_date /*- 1*/)CONNECT BY PRIOR dep.component_item_id = dep.component_item_idAND PRIOR dep.disable_date >= dep.effectivity_date /*- 1*/AND PRIOR dep.disable_date < dep.disable_date)GROUP BY component_item_id, effectivity_date;
- 例程三:区间端点排序重组
分别找到不被其它期间包含的开始日期和结束日期,即合并后应得的区间端点,两类端点(开始和结束)分别排序,同位次的端点即组成新区间。
这里使用dense_rank是因为可能有重复的日期出现。
将SQL中注释的部分解除,则可实现如例程一的进一步合并连续区间的效果。
sql">SQL> SELECT component_item_id,2 MIN(new_date) effectivity_date,3 MAX(new_date) disable_date4 FROM (SELECT dep.component_item_id,5 dep.effectivity_date new_date,6 dense_rank() over(PARTITION BY dep.component_item_id ORDER BY dep.effectivity_date) dr7 FROM demo_eff_periods dep8 WHERE NOT EXISTS9 (SELECT 110 FROM demo_eff_periods dep011 WHERE dep0.component_item_id = dep.component_item_id12 AND dep0.effectivity_date < dep.effectivity_date13 AND dep0.disable_date >= dep.effectivity_date /*- 1*/)14 UNION ALL15 SELECT dep.component_item_id,16 dep.disable_date new_date,17 dense_rank() over(PARTITION BY dep.component_item_id ORDER BY dep.disable_date) dr18 FROM demo_eff_periods dep19 WHERE NOT EXISTS20 (SELECT 121 FROM demo_eff_periods dep022 WHERE dep0.component_item_id = dep.component_item_id23 AND dep0.effectivity_date <= dep.disable_date /*+ 1*/24 AND dep0.disable_date > dep.disable_date))25 GROUP BY component_item_id, dr;COMPONENT_ITEM_ID EFFECTIVITY_DATE DISABLE_DATE
----------------- ---------------- ------------1 1000 39991 4000 49992 1000 49993 1000 39994 1000 19994 2000 29994 3000 39995 1000 19995 4000 49999 rows selected
- 例程四:区间端点排序重组(减少扫描表次数)
对于每一个原始期间,如有比自己早开始的期间其结束日期比自己的开始日期还大,则自己的开始日期将被吞并;如有比自己晚结束的期间其开始日期比自己的结束日期还小,则自己的结束日期将被吞并。
筛选出合并后应得的区间端点后,两类端点(开始和结束)分别排序,同位次的端点即组成新区间。
这里使用dense_rank是因为可能有重复的日期出现。
将SQL中注释的部分解除,则可实现如例程一的进一步合并连续区间的效果。
sql">SQL> SELECT component_item_id,2 MIN(new_date) effectivity_date,3 MAX(new_date) disable_date4 FROM (SELECT component_item_id,5 new_date,6 dense_rank() over(PARTITION BY component_item_id, date_pos ORDER BY new_date) dr7 FROM (SELECT dep.component_item_id,8 CASE9 WHEN MAX(dep.disable_date)10 over(PARTITION BY dep.component_item_id ORDER BY11 dep.effectivity_date12 RANGE BETWEEN unbounded preceding AND 113 preceding) >= dep.effectivity_date /*- 1*/ THEN14 NULL15 ELSE16 dep.effectivity_date17 END AS effectivity_date,18 CASE19 WHEN MIN(dep.effectivity_date)20 over(PARTITION BY dep.component_item_id ORDER BY21 dep.disable_date DESC22 RANGE BETWEEN unbounded preceding AND 123 preceding) <= dep.disable_date /*+ 1*/ THEN24 NULL25 ELSE26 dep.disable_date27 END AS disable_date28 FROM demo_eff_periods dep)29 UNPIVOT(new_date FOR date_pos IN(effectivity_date, disable_date)))30 GROUP BY component_item_id, dr;COMPONENT_ITEM_ID EFFECTIVITY_DATE DISABLE_DATE
----------------- ---------------- ------------1 1000 39991 4000 49992 1000 49993 1000 39994 1000 19994 2000 29994 3000 39995 1000 19995 4000 49999 rows selected
更多方法,欢迎大家讨论