一、每个学科的成绩第一名是谁?
0 问题描述
基于学生成绩表输出每个科目的第一名是谁呢?
1 数据准备
with t1 as(select'zs' as name,'[{"Chinese":80},{"Math":70},{"English":60}]' as score_extunion allselect'ls' as name,'[{"Chinese":90},{"Math":70},{"English":90}]' as score_extunion allselect'ww' as name,'[{"Chinese":60},{"Math":90},{"English":80}]' as score_ext),t2 as(selectname,-- 需要把 [] 和 " 和 {} 都给去掉,方面后面操作regexp_replace(score_ext, '\\[|\\{|\\}|\\"|\\]', '') as scoresfrom t1),t3 as (selectname,split(score, ":")[0] as course,split(score, ":")[1] as scorefrom t2lateral view explode(split(scores, ',')) expl as score)selectname,course,score
from (selectcourse,name,score,row_number() over (partition by course order by score desc) as rnfrom t3) t4
where rn = 1;
2 数据分析
步骤1:采用 regexp_replace(score_ext, '\\[|\\{|\\}|\\"|\\]', '') as scores,去除相关符号后,t2输出:
步骤2:基于t2表进行lateral view explode 侧写炸裂时,t3输出:
步骤3:基于t3表进行 row_number() over (partition by course order by score desc) as rn
开窗排序(分组topN),t4输出:
3 小结
该题目利用(行转列)炸裂+开窗进行 解决;用到的Hive正则表达式如下:
regexp_replace正则替换
- 语法: regexp_replace(string A, string B, string C)
- 操作类型: strings
- 返回值: string
- 说明: 将字符串A中的符合java正则表达式B的部分替换为C。
(06)Hive——正则表达式_hive 正则表达式-CSDN博客