clickhouse分组排序,行号,取特定数量数据

news/2024/11/28 6:13:56/

文章目录

      • 1、源数据
      • 2、生成数组
        • 2.1 groupArray 分组合并为数组
        • 2.2 arrayEnumerate 标记数据
      • 3、rank()、row_number()
        • 3.1 说明
        • 3.2 使用

目前应用很多需求设计对数据分组并去特定数量的数据;
clickhouse 新版本增加了row_number(),rank() 函数,可以直接对分组数据添加行号;下面是记录了这两个函数的基本使用;另外用 groupArray方式也大概实现了添加行号(官网上有说 groupArrayLast 可以实现自动排序的功能),但是实测这函数没法使用,所以暂不记录;

1、源数据

select 'a' as name, 25 as age,165 as height union all
select 'b' as name, 21 as age,182 as height union all
select 'a' as name, 21 as age,187 as height union all
select 'a' as name, 25 as age,158 as height union all
select 'b' as name, 22 as age,168 as height

2、生成数组

2.1 groupArray 分组合并为数组

groupArray 会把同类型的值合并为数组,并过滤NULL值数据;格式groupArray(max_size)(fields)

-- 不限制分组数量
select name,groupArray(age) from 
(select 'a' as name, 25 as age,165 as height union all select 'b' as name, 21 as age,182 as height union all select 'a' as name, 21 as age,187 as height union all select 'a' as name, 25 as age,158 as height union all select 'b' as name, 22 as age,168 as height
) a group by name;
namegroupArray(age)
b[21,22]
a[25,21,25]
-- 限制分组数量为1
-- 如果先对内部数据排序再分组可拿取age最大的一条
-- arrayStringConcat 将数据用特定字符合并
select name,groupArray(1)(age),arrayStringConcat(groupArray(1)(age),'') from 
(select * from (select 'a' as name, 25 as age,165 as height union all select 'b' as name, 21 as age,182 as height union all select 'a' as name, 21 as age,187 as height union all select 'a' as name, 25 as age,158 as height union all select 'b' as name, 22 as age,168 as height) a  order by age desc 
) r group by name;
namegroupArray(1)(age)arrayStringConcat(groupArray(1)(age),‘’)
b[22]22
a[25]25
2.2 arrayEnumerate 标记数据
select name,groupArray(age) as values,arrayEnumerate(values) as indexs from 
(select 'a' as name, 25 as age,165 as height union all select 'b' as name, 21 as age,182 as height union all select 'a' as name, 21 as age,187 as height union all select 'a' as name, 25 as age,158 as height union all select 'b' as name, 22 as age,168 as height
) a group by name;
namevaluesindexs
b[21,22][1,2]
a[25,21,25][1,2,3]

3、rank()、row_number()

3.1 说明

https://clickhouse.com/docs/en/sql-reference/window-functions

3.2 使用
--rank()
select name,age,rank() over(partition by name order by age asc ) from 
(select 'a' as name, 25 as age,165 as height union all select 'b' as name, 21 as age,182 as height union all select 'a' as name, 21 as age,187 as height union all select 'a' as name, 25 as age,158 as height union all select 'b' as name, 22 as age,168 as height
) a group by name,age;-- row_number()
select name,age,rank() over(partition by name order by age asc ) from 
(select 'a' as name, 25 as age,165 as height union all select 'b' as name, 21 as age,182 as height union all select 'a' as name, 21 as age,187 as height union all select 'a' as name, 25 as age,158 as height union all select 'b' as name, 22 as age,168 as height
) a group by name,age;
nameagerank() OVER (PARTITION BY name ORDER BY age ASC)
a211
a252
b211
b222

PS. 行号已经标明,后续需要取多少数据设置行号条件即可


http://www.ppmy.cn/news/1121378.html

相关文章

基于Java+SpringBoot+Vue+协同过滤算法的电影推荐系统(亮点:智能推荐、协同过滤算法、在线支付、视频观看)

协同过滤算法的电影推荐系统 一、前言二、我的优势2.1 自己的网站2.2 自己的小程序(小蔡coding)2.3 有保障的售后2.4 福利 三、开发环境与技术3.1 MySQL数据库3.2 Vue前端技术3.3 Spring Boot框架3.4 微信小程序 四、功能设计4.1 主要功能描述 五、系统实…

Java反序列化和php反序列化的区别

文章目录 PHP反序列化漏洞反序列化漏洞什么是反序列化漏洞?修改序列化后的数据,目的是什么? Java反序列化漏洞反序列化漏洞 PHP反序列化漏洞 序列化存在的意义是为了传输数据/对象,类是无法直接进行传输的。通过序列化后转换为字…

QT--day5

注册 mainwindow.h #ifndef MAINWINDOW_H #define MAINWINDOW_H#include <QMainWindow> #include<QPushButton> #include<QLineEdit> #include<QLabel> #include <QMessageBox> #include<QString> #include<QSqlDatabase> …

操作系统篇之虚拟内存

虚拟内存是什么? 虚拟内存是计算机操作系统中的一种技术&#xff0c;它将每个进程的内存空间划分成若干个固定大小的页&#xff0c;并通过页面映射技术将这些页与物理内存或磁盘上的页面文件进行交换 虚拟内存能干什么? 扩展了实际物理内存容量&#xff1a;虚拟内存使得每个…

pycharm 让控制台里的链接可以点击

前言 如果细心就会发现pychram控制台里一些链接是可以点击的,另外一些不行,那么如果让输出的链接可以点击如何做呢? 解决 输出的i链接会在控制台里可以点击,并且点击会在本地直接打开 如果打印的是网址则可以直接点击 print(file:///{}.format(i))print(https://www.baid…

电池厂提供excel电池曲线zcv到mtk电池曲线zcv转换

#encoding:utf8 #电池厂提供excel电池曲线zcv到mtk电池曲线zcv转换 import pandas as pd import openpyxl import math # 读取Excel文件 df pd.read_excel("a55-zcv.xlsx") for j in range(0,10): if(j<3): offset0 #T0~T2 if(j3): offset…

多旋翼无人机组合导航系统-多源信息融合算法(Matlab代码实现)

&#x1f4a5;&#x1f4a5;&#x1f49e;&#x1f49e;欢迎来到本博客❤️❤️&#x1f4a5;&#x1f4a5; &#x1f3c6;博主优势&#xff1a;&#x1f31e;&#x1f31e;&#x1f31e;博客内容尽量做到思维缜密&#xff0c;逻辑清晰&#xff0c;为了方便读者。 ⛳️座右铭&a…

手游模拟器长时间运行后,游戏掉帧且不恢复

1&#xff09;手游模拟器长时间运行后&#xff0c;游戏掉帧且不恢复 2&#xff09;FrameBuffer Fetch无论哪种模式在确定支持的手机上显示全紫 3&#xff09;协程中yield return CoFunction()和yield return StartCoroutine(CoFunction())的区别 这是第353篇UWA技术知识分享的推…