时间:2023-10-31来源:系统城装机大师作者:佚名
零、需求介绍
现有一张表数据如下:
此表是一张镜像表,policyno列代表一个保单号,state列代表这个保单号在snapdate当天的最后一次状态(state每天可能会变很多次,镜像表只保留snapdate时间点凌晨的最后一次状态),snapdate代表当天做镜像的时间,现在有个需求,我们想取出来这个保单号连续保持某个状态的起止时间,例如:
保单号sm1保持状态1的起止时间为2021020120210202,然后在20210203时候变成了状态2,又在20210204时候变成了状态3,最终又在2021020520210209时间段保持在状态1,然后镜像表的程序可能期间出现过问题,在20210210开始到20210215日没有镜像成功,直到20210216日才恢复,20210216~20210219日保单号sm1的状态一直保持为1,后续还有可能继续变,那么,上面说的保单sm1的几个状态的连续时间,我们想要的结果为:
1 2 3 4 5 6 7 |
POLICYNO STATE START_DATE END_DATE sm1 1 20210201 20210202 sm1 2 20210203 20210203 sm1 3 20210204 20210204 sm1 1 20210205 20210209 sm1 1 20210216 20210219 ......................... |
我这里提供5种写法,可以归结为两大类:
一类:通过使用分析函数或自关联获取数据连续性,构造一个分组字段进行分组求最大最小值。
二类:通过树形层次查询获取连续性,获取起止时间。
一、通过使用lag分析函数获取前后时间,根据当前时间与前后时间的差值进行判断获取时间连续性标志,然后使用sum()over()对连续性标志进行累加,从而生成一个新的临时分组字段,最终根据policyno,state,临时分组字段进行分组取最大最小值
这里为了好理解,每一个处理步骤都单独写出来了,实际使用中可以简写一下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
with t as --求出来每条数据当天的前一天镜像时间 ( select a.policyno, a.state, a.snapdate, lag(a.snapdate) over(partition by a.policyno, a.state order by a.snapdate) as lag_tim from zyd.temp_0430 a order by a.policyno, a.snapdate), t1 as --判断当天镜像时间和前一天的镜像时间+1是否相等,如果相等就置为0否则置为1,新增临时字段lxzt意为:连续状态标志 ( select t.*, case when t.snapdate = t.lag_tim + 1 then 0 else 1 end as lxzt from t order by policyno, snapdate), t2 as --根据lxzt字段进行sum()over()求和,求出来一个新的用来做分组依据的字段,简称fzyj ( select t1.*, sum (lxzt) over( order by policyno, snapdate) as fzyj from t1) select policyno, --最后根据policyno,state,fzyj进行分组求最大最小值即为状态连续的开始结束时间 state, -- fzyj, min (snapdate) as start_snap, max (snapdate) as end_snap from t2 group by policyno, state, fzyj order by fzyj; |
二、不使用lag分析函数,通过自关联也能判断出来哪些天连续,然后后面操作步骤同上,这个写法算是对lag()over()函数的一个回写,摆脱对分析函数的依赖
下面这种写法,需要读两次表,上面lag的方式是对这个写法的一种优化:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
with t as ( select a.policyno, a.state, a.snapdate, b.snapdate as snap2 from zyd.temp_0430 a, zyd.temp_0430 b where a.policyno = b.policyno(+) and a.state = b.state(+) and a.snapdate - 1 = b.snapdate(+) order by policyno, snapdate), t1 as ( select t.*, case when snap2 is null then 1 else 0 end as lxzt from t order by policyno, snapdate), t2 as ( select t1.*, sum (lxzt) over( order by policyno, snapdate) as fzyj from t1 order by policyno, snapdate) select policyno, state, fzyj, min (snapdate) as start_snap, max (snapdate) as end_snap from t2 group by policyno, state, fzyj order by fzyj; |
三、通过构造树形结构,确定根节点和叶子节点来获取状态连续的开始和结束时间
先按照数据的连续性构造显示每层关系的树状结构:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
with t as ( select a.policyno, a.state, a.snapdate, lag(a.snapdate) over(partition by a.policyno, a.state order by a.snapdate) as lag_tim from zyd.temp_0430 a --where policyno='sm1' order by a.policyno, a.snapdate), t1 as ( select t.*, case when t.snapdate = t.lag_tim + 1 then 0 else 1 end as lxzt from t order by policyno, snapdate), t2 as ( select t1.*, lpad( '->' , ( level - 1) * 2, '->' ) || snapdate as 树状结构, level as 树中层次, decode( level , 1, 1) 是否根节点, decode(connect_by_isleaf, 1, 1) 是否叶子节点, case when (connect_by_isleaf = 0 and level > 1) then 1 end 是否树杈, ( prior snapdate) as 根值, connect_by_root snapdate 主根值 from t1 start with (lxzt = 1) connect by ( prior snapdate = snapdate - 1 and prior state = state and prior policyno = policyno) order by policyno, snapdate) select * from t2; |
从上面能清晰的看出来,每一次连续状态的开始日期作为每个树的根,分支节点即树杈和叶子节点的关系一步步拓展开来,分析上面数据我们能够知道,如果我们想要获取每个保单状态连续时间范围,以上面的数据现有分布方式,现在就可以:通过policyno,state,主根值进行group by 取snapdate的最大最小值,类似前面两个写法的最终步骤;
接下来,我们这个第三种写法就是按照这个方式写:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
with t as ( select a.policyno, a.state, a.snapdate, lag(a.snapdate) over(partition by a.policyno, a.state order by a.snapdate) as lag_tim from zyd.temp_0430 a --where policyno='sm1' order by a.policyno, a.snapdate), t1 as ( select t.*, case when t.snapdate = t.lag_tim + 1 then 0 else 1 end as lxzt from t order by policyno, snapdate), t2 as ( select t1.*, lpad( '->' , ( level - 1) * 2, '->' ) || snapdate as 树状结构, level as 树中层次, decode( level , 1, 1) 是否根节点, decode(connect_by_isleaf, 1, 1) 是否叶子节点, case when (connect_by_isleaf = 0 and level > 1) then 1 end 是否树杈, ( prior snapdate) as 根值, connect_by_root snapdate 主根值 from t1 start with (lxzt = 1) connect by ( prior snapdate = snapdate - 1 and prior state = state and prior policyno = policyno) order by policyno, snapdate) select policyno, state, min (snapdate) as start_date, max (snapdate) as end_date from t2 group by policyno, state, 主根值 order by policyno, state; |
四、参照过程三,既然已经获取了每条数据的主根值和叶子节点的值,这就代表了我们知道了每个保单状态的连续开始和结束时间,那直接取出来叶子节点数据,叶子节点主根值就是开始日期,叶子节点的值就是结束日期,这样我们就不需再group by了
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
with t as ( select a.policyno, a.state, a.snapdate, lag(a.snapdate) over(partition by a.policyno, a.state order by a.snapdate) as lag_tim from zyd.temp_0430 a --where policyno='sm1' order by a.policyno, a.snapdate), t1 as ( select t.*, case when t.snapdate = t.lag_tim + 1 then 0 else 1 end as lxzt from t order by policyno, snapdate), t2 as ( select t1.*, lpad( '->' , ( level - 1) * 2, '->' ) || snapdate as 树状结构, level as 树中层次, decode( level , 1, 1) 是否根节点, decode(connect_by_isleaf, 1, 1) 是否叶子节点, case when (connect_by_isleaf = 0 and level > 1) then 1 end 是否树杈, ( prior snapdate) as 根值, connect_by_root snapdate 主根值 from t1 start with (lxzt = 1) connect by ( prior snapdate = snapdate - 1 and prior state = state and prior policyno = policyno) order by policyno, snapdate) select policyno, state, 主根值 as start_date, snapdate as end_date from t2 where 是否叶子节点 = 1 order by policyno, snapdate |
五、在Oracle10g之前,上面树状查询的关键函数 connect_by_root还不支持,如果使用树形结构,可以通过sys_connect_by_path来实现
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
with t as ( select a.policyno, a.state, a.snapdate, lag(a.snapdate) over(partition by a.policyno, a.state order by a.snapdate) as lag_tim --case when lag(a.snapdate) over(partition by a.policyno, a.state order by a.snapdate) is null then snapdate else lag(a.snapdate) over(partition by a.policyno, a.state order by a.snapdate) end as lag_tim from zyd.temp_0430 a order by a.policyno, a.snapdate), t1 as ( select t.*, case when t.snapdate = t.lag_tim + 1 then 0 else 1 end as lxzt from t order by policyno, snapdate), t2 as ( select t1.*, sys_connect_by_path(snapdate, ',' ) as pt, level , connect_by_isleaf as cb from t1 start with (lxzt = 1) connect by ( prior snapdate = snapdate - 1 and prior state = state and prior policyno = policyno)) select t2.*, regexp_substr(pt, '[^,]+' , 1, 1) as start_date, regexp_substr(pt, '[^,]+' , 1, regexp_count(pt, ',' )) as end_date from t2 where cb = 1 order by policyno, state; |
还有好多其他写法,这里不再一一列举!
总结
到此这篇关于Oracle数仓中判断时间连续性的几种SQL写法的文章就介绍到这了
2023-10-31
Oracle如何编写一个sqlldr实例2023-10-31
Oracle的SQLLDR用法简介2023-10-31
Oracle中的高效SQL编写PARALLEL解析1.Oracle数据库系统结构概述 2.Oracle数据库存储结构 物理存储结构 控制文件 数据文件 重做日志文件 归档日志文件 Oracle数据库逻辑结构 数据块(Data Block) (盘)区(Extent) 段(Segment) 表空间(Tablespace) 本地管...
2023-10-31
windows下的Oracle19c 一、官网下载Oracle19c数据库 二、安装Oracle数据库 1.解压安装包 2.运行setup.exe安装 三、配置 四、安装完Oracle数据库,给scott用户解锁 1.解决Oracle数据库中没有scott账户的问题 2.给scott...
2023-10-31