连续活跃天数统计

devtools/2024/9/23 2:20:37/

连续活跃天数统计

需求说明

什么是连续出现?

假设有如下日期信息: 20230401,20230402,20230403,20230405,20230406,20230407,20230410,20230411
则:
20230401-20230403 为一次连续出现,连续出现天数为 3
20230405-20230407 为一次连续出现,连续出现天数为 3
20230410-20230411 为一次连续出现,连续出现天数为 2


在一些业务场景下,我们需要找符合类似规则的的对象。这里基于Python和SQL尝试进行解决。

python版本解决方案

import pandas as pd# 计算连续出现天数
def calculate_consecutive_days(df):results = []df['日期'] = pd.to_datetime(df['日期'], format='%Y%m%d')df = df.sort_values(by=['对象ID', '日期'])# 遍历每个号码for number, group in df.groupby('对象ID''):consecutive_days = 1start_date = group['日期'].iloc[0]prev_date = group['日期'].iloc[0]details = []# 遍历每个日期for date in group['日期'].iloc[1:]:if date == prev_date + pd.Timedelta(days=1):consecutive_days += 1else:if consecutive_days > 1:details.append((start_date, prev_date, consecutive_days))consecutive_days = 1start_date = dateprev_date = date# 添加最后一个连续天数if consecutive_days > 1:details.append((start_date, prev_date, consecutive_days))# 添加结果for start, end, days in details:results.append({'对象ID': number,'连续出现天数': days,'首次发现日期': start.strftime('%Y%m%d'),'末次发现日期': end.strftime('%Y%m%d'),'连续天数详情': f"{start.strftime('%Y%m%d')}-{end.strftime('%Y%m%d')}"})return results

模拟数据这里就不提供了,下面是运行结果。
在这里插入图片描述

SQL版本解决方案

【基于mysql8+版本,用到了一些mysql5版本不支持的语法~】

创建模拟数据

CREATE TABLE OccurrenceDays (uid VARCHAR(100),dt DATE );INSERT INTO OccurrenceDays (uid, dt) VALUES  
('1234567890', '2023-04-01'),  
('1234567890', '2023-04-02'),  
('1234567890', '2023-04-03'),  
('1234567890', '2023-04-05'),  
('1234567890', '2023-04-06'),  
('1234567890', '2023-04-07'),  
('1234567890', '2023-04-10'),  
('1234567890', '2023-04-11'),  
('9876543210', '2023-04-02'),  
('9876543210', '2023-04-03'),  
('9876543210', '2023-04-04');

查询脚本:

WITH RankedDays AS (  SELECT   uid,  dt,  DATE_SUB(dt, INTERVAL ROW_NUMBER() OVER (PARTITION BY uid ORDER BY dt) DAY) AS grp  FROM   OccurrenceDays  
),  
GroupedDays AS (  SELECT   uid,  MIN(dt) AS start_dt,  MAX(dt) AS end_dt,  COUNT(*) AS consecutive_days  FROM   RankedDays  GROUP BY   uid, grp  HAVING   COUNT(*) > 1  
),  
ConsecutiveDetails AS (  SELECT   g.uid,  g.consecutive_days,  g.start_dt,  g.end_dt,  GROUP_CONCAT(od.dt ORDER BY od.dt SEPARATOR ',') AS consecutive_details  FROM   GroupedDays g  JOIN   OccurrenceDays od ON g.uid = od.uid AND od.dt BETWEEN g.start_dt AND g.end_dt  GROUP BY   g.uid, g.consecutive_days, g.start_dt, g.end_dt  
)  
SELECT   uid,  consecutive_days,  DATE_FORMAT(start_dt, '%Y%m%d') AS first_discovery_date,  DATE_FORMAT(end_dt, '%Y%m%d') AS last_discovery_date,  consecutive_details  
FROM   ConsecutiveDetails  
ORDER BY   uid, start_dt;

查询结果:

uidconsecutive_daysfirst_discovery_datelast_discovery_dateconsecutive_details
1234567890320230401202304032023-04-01,2023-04-02,2023-04-03
1234567890320230405202304072023-04-05,2023-04-06,2023-04-07
1234567890220230410202304112023-04-10,2023-04-11
9876543210320230402202304042023-04-02,2023-04-03,2023-04-04

不使用CTE语法

drop  table OccurrenceDays;   
CREATE TABLE OccurrenceDays (uid VARCHAR(100),dt VARCHAR(100)
);
-- 模拟数据
INSERT INTO OccurrenceDays (uid, dt) VALUES ('1234567890', '20230401');
INSERT INTO OccurrenceDays (uid, dt) VALUES ('1234567890', '20230402');
INSERT INTO OccurrenceDays (uid, dt) VALUES ('1234567890', '20230403');
INSERT INTO OccurrenceDays (uid, dt) VALUES ('1234567890', '20230405');
INSERT INTO OccurrenceDays (uid, dt) VALUES ('1234567890', '20230406');
INSERT INTO OccurrenceDays (uid, dt) VALUES ('1234567890', '20230407');
INSERT INTO OccurrenceDays (uid, dt) VALUES ('1234567890', '20230410');
INSERT INTO OccurrenceDays (uid, dt) VALUES ('1234567890', '20230411');
INSERT INTO OccurrenceDays (uid, dt) VALUES ('9876543210', '20230402');
INSERT INTO OccurrenceDays (uid, dt) VALUES ('9876543210', '20230403');
INSERT INTO OccurrenceDays (uid, dt) VALUES ('9876543210', '20230404');
INSERT INTO OccurrenceDays (uid, dt) VALUES ('9876543210', '20230405');
INSERT INTO OccurrenceDays (uid, dt) VALUES ('9876543210', '20230406');-- 查询脚本   
SELECT  uid,  COUNT(*) AS consecutive_days,  MIN(dt) AS first_occurrence_date,  MAX(dt) AS last_occurrence_date,  GROUP_CONCAT(dt ORDER BY dt) AS consecutive_days_detail  
FROM (  SELECT  uid,  dt,  dt - ROW_NUMBER() OVER (PARTITION BY uid ORDER BY STR_TO_DATE(dt, '%Y%m%d')) AS grp  FROM  OccurrenceDays  
) AS ranked_dates  
GROUP BY  uid, grp  
HAVING  COUNT(*) > 1  
ORDER BY  uid, MIN(dt);

输出结果:

uidconsecutive_daysfirst_occurrence_datelast_occurrence_dateconsecutive_days_detail
12345678903202304012023040320230401,20230402,20230403
12345678903202304052023040720230405,20230406,20230407
12345678902202304102023041120230410,20230411
98765432105202304022023040620230402,20230403,20230404,20230405,20230406

补充:模拟数据生成脚本

import random  
import datetime  def generate_sql_inserts(uid):  # 获取当前日期  today = datetime.date.today()  # 设定近3个月的开始日期  start_date = today - datetime.timedelta(days=90)  # 初始化SQL语句列表  sql_inserts = []  # 设定要插入的数据条数,这里可以调整  num_inserts = 30  # 用于追踪连续日期的计数器  consecutive_counter = 0  last_date = None  for _ in range(num_inserts):  # 生成随机日期,在最近三个月内  random_date = start_date + datetime.timedelta(days=random.randint(0, (today - start_date).days))  random_date_str = random_date.strftime('%Y%m%d')  # 以一定概率生成连续日期  if last_date and random.random() < 0.5:  # 假设有50%的概率生成连续日期  random_date_str = (last_date + datetime.timedelta(days=1)).strftime('%Y%m%d')  consecutive_counter += 1  else:  consecutive_counter = 0  last_date = datetime.datetime.strptime(random_date_str, '%Y%m%d').date()  # 生成SQL INSERT语句  sql_insert = f"INSERT INTO OccurrenceDays (uid, dt) VALUES ('{uid}', '{random_date_str}');"  sql_inserts.append(sql_insert)  return "\n".join(sql_inserts)  # 示例使用  
uid = "a"  
sql_script = generate_sql_inserts(uid)  
print(sql_script)

http://www.ppmy.cn/devtools/38303.html

相关文章

Unity数据持久化之Json

Json概述 Json是什么? 全称:JavaScript对象简谱(JavaScript Object Notation) Json是国际通用的一种轻量级的数据交换格式 主要在网络通讯中用于传输数据,或本地数据存储和读取 易于人阅读和编写,同时也易于机器解析和生成,并有效地提升网络传输效率 我们一般使用Json文件来…

【Linux】常用基本指令

目录 食用说明 用户管理 whoami/who clear tree 目录结构和路径 pwd ls 文件 隐藏文件 常用选项 cd 家目录、根目录、绝对路径和相对路径 touch 常用选项 mkdir rmdir/rm man cp mv cat nano echo 输出重定向 > 输入重定向 < more/less head/…

MySQL-数据缓冲池(Buffer Pool)

InnoDB存储引擎以 页 为单位管理存储空间&#xff0c;增删改查的本质就是访问页面。为提高查询效率&#xff0c;DBMS会占用内存作为缓冲池&#xff0c;在执行SQL之前&#xff0c;会将磁盘上的页 缓存到内存中的 缓冲池&#xff08;Buffer Pool&#xff09;后执行相关SQL语句。 …

C++ Primer 中文版(第 5 版)- 第一单元练习

第一单元 练习 1.1 编写程序&#xff0c;在标准输出上打印Hello, World。 #include <iostream>int main(int argc, const char * argv[]) {std::cout << "Hello, ceshi!\n";return 0; }练习 1.2 我们编写程序使用乘法运算符*&#xff0c;来打印两个数…

【Android】源码解析Activity的结构分析

源码解析Activity的结构分析 目录 1、Activity、View、Window有什么关联&#xff1f;2、Activity的结构构建流程3 源码解析Activity的构成 3.1 Activity的Attach方法3.2 Activity的OnCreate 4、WindowManager与View的关系总结 1、一个Activity对应几个WindowManage&#xff0…

webrtc应用举例

WebRTC&#xff08;Web Real-Time Communication&#xff09;是一种用于在Web浏览器和移动应用程序之间进行实时通信的开放标准。以下是一些WebRTC应用的例子&#xff1a; 1. **WebRTC电话**&#xff1a;用户可以通过Web浏览器进行点对点的音频和视频通话&#xff0c;无需安装…

VS2019下使用MFC完成科技项目管理系统

背景&#xff1a; &#xff08;一&#xff09;实验目的 通过该实验&#xff0c;使学生掌握windows程序设计的基本方法。了解科技项目组织管理的主要内容和管理方面的基本常识&#xff0c;熟练应用数据库知识&#xff0c;通过处理过程对计算机软件系统工作原理的进一步理解&…

docker容器 怎么查看运行日志

在Docker中&#xff0c;查看容器的运行日志可以使用docker logs命令。该命令允许你获取容器的日志输出&#xff0c;这对于调试和监控容器的状态非常有用。 以下是docker logs命令的一些常用用法&#xff1a; 基本用法 docker logs [OPTIONS] CONTAINERCONTAINER 是容器的ID或…