459 lines
13 KiB
Markdown
459 lines
13 KiB
Markdown
# SQL窗口函数
|
||
|
||
### 一. 什么是窗口函数
|
||
|
||
#### 基本含义
|
||
|
||
窗口限定一个范围,它可以理解为满足某些条件的记录集合,窗口函数也就是在窗口范围内执行的函数。
|
||
|
||
#### 基本语法
|
||
|
||
窗口函数有over关键字,指定函数执行的范围,可分为三部分:分组子句(partition by),排序子句(order by),窗口子句(rows)
|
||
|
||
```sql
|
||
<函数名> over (partition by <分组的列> order by <排序的列> rows between <起始行> and <终止行>)
|
||
```
|
||
|
||
|
||
|
||
**注意Mysql8才支持窗口函数**
|
||
|
||
|
||
|
||
#### 演示表格
|
||
|
||
| cid(班级id) | sname(学生姓名) | score(分数) |
|
||
| ------------ | ----------------- | ------------- |
|
||
| 001 | 张三 | 78 |
|
||
| 001 | 李四 | 82 |
|
||
| 002 | 小明 | 90 |
|
||
| 001 | 王五 | 67 |
|
||
| 002 | 小红 | 85 |
|
||
| 002 | 小刚 | 62 |
|
||
|
||
#### 演示脚本
|
||
|
||
```sql
|
||
CREATE TABLE SQL_5 (
|
||
cid varchar(4),
|
||
sname varchar(4),
|
||
score int
|
||
);
|
||
|
||
insert into SQL_5 (cid, sname, score) values ('001', '张三', 78);
|
||
insert into SQL_5 (cid, sname, score) values ('001', '李四', 82);
|
||
insert into SQL_5 (cid, sname, score) values ('002', '小明', 90);
|
||
insert into SQL_5 (cid, sname, score) values ('001', '王五', 67);
|
||
insert into SQL_5 (cid, sname, score) values ('002', '小红', 85);
|
||
insert into SQL_5 (cid, sname, score) values ('002', '小刚', 62);
|
||
```
|
||
|
||
|
||
|
||
### 二. 窗口的确定
|
||
|
||
例子:
|
||
|
||
```sql
|
||
select *, sum(score) over (partition by cid order by score rows between unbounded preceding and unbounded following) as '班级总分' from SQL_5;
|
||
```
|
||
|
||
#### 分组子句(partition by)
|
||
|
||
不分组可以写成partition by null或者直接不写
|
||
|
||
后面可以跟多个列, 如 partition by cid, sname
|
||
|
||
**注意 partition by与group by的区别**
|
||
|
||
1)前者不会压缩行数但是后者会
|
||
|
||
2)后者只能选取分组的列和聚合的列
|
||
|
||
也就是说group by 后生成的结果集与原表的行数和列数都不同
|
||
|
||
#### 排序子句(order by)
|
||
|
||
不排序可以写成order by null 或者直接不写
|
||
|
||
asc或不写表示升序,desc表示降序
|
||
|
||
后面可以跟多个列, 如 order by cid, sname
|
||
|
||
#### 窗口子句(rows)
|
||
|
||
窗口子句的描述
|
||
|
||
1) 起始行: N preceding/unbounded preceding
|
||
|
||
2) 当前行: current row
|
||
|
||
3) 终止行: N following/unbounded following
|
||
|
||
举例:
|
||
|
||
rows between unbounded preceding and current row 从之前所有的行到当前行
|
||
|
||
rows between 2 preceding and current row 从前面两行到当前行
|
||
|
||
rows between current row and unbounded following 从当前行到之后所有的行
|
||
|
||
rows between current row and 1following 从当前行到后面一行
|
||
|
||
**注意:**
|
||
|
||
**排序子句后面缺少窗口子句,窗口规范默认是 rows between unbounded preceding and current row**
|
||
|
||
**排序子句和窗口子句都缺失,窗口规范默认是 rows between unbounded preceding and unbounded following**
|
||
|
||
#### 总体流程
|
||
|
||
1) 通过partition by 和 order by 子句确定大窗口( 定义出上界unbounded preceding和下界unbounded following)
|
||
|
||
2) 通过row 子句针对每一行数据确定小窗口(滑动窗口)
|
||
|
||
3) 对每行的小窗口内的数据执行函数并生成新的列
|
||
|
||
|
||
|
||
### 三. 函数分类
|
||
|
||
#### 排序类
|
||
|
||
rank, dense_rank, row_number
|
||
|
||
```sql
|
||
-- 【排序类】
|
||
-- 按班级分组后打上序号 不考虑并列
|
||
select *, row_number() over (partition by cid order by score desc) as '不可并列排名' from SQL_5;
|
||
-- 按班级分组后作跳跃排名 考虑并列
|
||
select *, rank() over (partition by cid order by score desc) as '跳跃可并列排名' from SQL_5;
|
||
-- 按班级分组后作连续排名 考虑并列
|
||
select *, dense_rank() over (partition by cid order by score desc) as '连续可并列排名' from SQL_5;
|
||
-- 合并起来对比
|
||
select *, row_number() over (partition by cid order by score desc) as '不可并列排名' ,
|
||
rank() over (partition by cid order by score desc) as '跳跃可并列排名',
|
||
dense_rank() over (partition by cid order by score desc) as '连续可并列排名'
|
||
from SQL_5;
|
||
```
|
||
|
||
#### 聚合类
|
||
|
||
sum. avg, count, max, min
|
||
|
||
```sql
|
||
-- 【聚合类】
|
||
-- 让同一班级每个学生都知道班级总分是多少
|
||
select *, sum(score) over (partition by cid) as '班级总分' from SQL_5;
|
||
-- 或者可以写成
|
||
select *, sum(score) over (partition by cid rows between unbounded preceding and unbounded following) as '班级总分' from SQL_5;
|
||
|
||
-- 计算同一班级,每个同学和比他分数低的同学的累计总分是多少
|
||
select *, sum(score) over (partition by cid order by score) '累加分数' from SQL_5;
|
||
-- 或者可以写成 其中rows between ... and 是规定窗口大小
|
||
select *, sum(score) over (partition by cid order by score rows between unbounded preceding and current row) as '累加分数' from SQL_5;
|
||
```
|
||
|
||
#### 跨行类
|
||
|
||
lag, lead
|
||
|
||
```sql
|
||
-- 【跨行类】
|
||
-- lag/lead 函数 参数1:比较的列 参数2: 偏移量 参数3:找不到的默认值
|
||
-- 同一班级内,成绩比自己低一名的分数是多少
|
||
select *, lag(score, 1) over (partition by cid order by score) as '低一名的分数' from SQL_5;
|
||
-- 或者写成
|
||
select *, lag(score, 1, 0) over (partition by cid order by score) as '低一名的分数' from SQL_5;
|
||
|
||
-- 同一班级内,成绩比自己高2名的分数是多少
|
||
select *, lead(score, 2) over (partition by cid order by score) as '高两名的分数' from SQL_5;
|
||
```
|
||
|
||
|
||
|
||
### 四. 相关题目
|
||
|
||
#### 表格
|
||
|
||
| cid | sname | course | score |
|
||
| ---- | ----- | ------ | ----- |
|
||
| 001 | 张三 | 语文 | 78 |
|
||
| 002 | 小刚 | 语文 | 71 |
|
||
| 001 | 李四 | 数学 | 56 |
|
||
| 002 | 小明 | 数学 | 54 |
|
||
| ... | ... | ... | ... |
|
||
|
||
#### 脚本
|
||
|
||
```sql
|
||
CREATE TABLE SQL_6 (
|
||
cid varchar(4),
|
||
sname varchar(4),
|
||
course varchar(10),
|
||
score int
|
||
);
|
||
|
||
insert into SQL_6 (cid, sname, course, score) values ('001', '张三', '语文', 78);
|
||
insert into SQL_6 (cid, sname, course, score) values ('002', '小刚', '语文', 71);
|
||
insert into SQL_6 (cid, sname, course, score) values ('001', '李四', '数学', 56);
|
||
insert into SQL_6 (cid, sname, course, score) values ('001', '王五', '数学', 97);
|
||
insert into SQL_6 (cid, sname, course, score) values ('002', '小明', '数学', 54);
|
||
insert into SQL_6 (cid, sname, course, score) values ('002', '小刚', '数学', 67);
|
||
insert into SQL_6 (cid, sname, course, score) values ('002', '小红', '数学', 82);
|
||
insert into SQL_6 (cid, sname, course, score) values ('001', '王五', '语文', 80);
|
||
insert into SQL_6 (cid, sname, course, score) values ('001', '张三', '数学', 77);
|
||
insert into SQL_6 (cid, sname, course, score) values ('002', '小明', '语文', 58);
|
||
insert into SQL_6 (cid, sname, course, score) values ('002', '小红', '语文', 87);
|
||
insert into SQL_6 (cid, sname, course, score) values ('001', '李四', '语文', 60);
|
||
insert into SQL_6 (cid, sname, course, score) values ('001', '张三', '英语', 66);
|
||
insert into SQL_6 (cid, sname, course, score) values ('002', '小刚', '英语', 50);
|
||
insert into SQL_6 (cid, sname, course, score) values ('001', '李四', '地理', 59);
|
||
insert into SQL_6 (cid, sname, course, score) values ('001', '王五', '地理', 88);
|
||
insert into SQL_6 (cid, sname, course, score) values ('002', '小明', '地理', 45);
|
||
insert into SQL_6 (cid, sname, course, score) values ('002', '小刚', '地理', 66);
|
||
insert into SQL_6 (cid, sname, course, score) values ('002', '小红', '地理', 82);
|
||
insert into SQL_6 (cid, sname, course, score) values ('001', '王五', '英语', 81);
|
||
insert into SQL_6 (cid, sname, course, score) values ('001', '张三', '地理', 77);
|
||
insert into SQL_6 (cid, sname, course, score) values ('002', '小明', '英语', 55);
|
||
insert into SQL_6 (cid, sname, course, score) values ('002', '小红', '英语', 87);
|
||
insert into SQL_6 (cid, sname, course, score) values ('001', '李四', '英语', 61);
|
||
```
|
||
|
||
##### 分组内topN
|
||
|
||
问题1:求出每个学生成绩最高的三条记录
|
||
|
||
```sql
|
||
select * from
|
||
(
|
||
select *, row_number() over (partition by sname order by score desc) as rn from SQL_6
|
||
) temp
|
||
where rn <= 3
|
||
```
|
||
|
||
###### 公式:
|
||
|
||
```sql
|
||
select * from
|
||
(
|
||
select *, row_number() over (partition by 分组列 order by 比较列) as rn from table
|
||
) as tmp
|
||
where rn <= N;
|
||
```
|
||
|
||
|
||
|
||
##### 汇总分析
|
||
|
||
问题2:找出每门课程都高于班级课程平均分的学生
|
||
|
||
可以拆解成以下几个问题:
|
||
|
||
1)求出每个班级,每门课程的平均分
|
||
|
||
```sql
|
||
with
|
||
-- 1) 求出每个班级,每门课程的平均分
|
||
t1 as
|
||
(
|
||
select *,
|
||
avg(score) over (partition by cid, course) as 'avg'
|
||
from SQL_6
|
||
),
|
||
```
|
||
|
||
2)将学生每门课程的成绩与所在班级的对应课程平均分相减,结果大于0就说明该学生的这门成绩高于课程平均分
|
||
|
||
```sql
|
||
t2 as (
|
||
select *,
|
||
score - avg as 'del'
|
||
from t1
|
||
)
|
||
```
|
||
|
||
3)“找出每门课程都高于班级课程平均分的学生”说明对于学生来说,最小的“相减结果”都是大于0的
|
||
|
||
```sql
|
||
select sname from t2
|
||
group by sname
|
||
having min(del) > 0;
|
||
```
|
||
|
||
合并后的SQL语句
|
||
|
||
```sql
|
||
with
|
||
t1 as
|
||
(
|
||
select *,
|
||
avg(score) over (partition by cid, course) as 'avg'
|
||
from SQL_6
|
||
),
|
||
t2 as (
|
||
select *,
|
||
score - avg as 'del'
|
||
from t1
|
||
)
|
||
select sname from t2
|
||
group by sname
|
||
having min(del) > 0;
|
||
|
||
-- 或者
|
||
select sname from (
|
||
select *,
|
||
score - avg as 'del'
|
||
from (
|
||
select *,
|
||
avg(score) over (partition by cid, course) as 'avg'
|
||
from SQL_6
|
||
) t1
|
||
) t2
|
||
group by sname
|
||
having min(del) > 0;
|
||
```
|
||
|
||
|
||
|
||
#### 表格
|
||
|
||
| empno | ename | hire_date | salary | dept_no |
|
||
| ----- | ----- | ---------- | ------ | ------- |
|
||
| 001 | Adam | 2018-03-01 | 1000 | A |
|
||
| 002 | Bill | 2021-03-01 | 1200 | A |
|
||
| 003 | Cindy | 2016-03-01 | 1500 | A |
|
||
| 004 | Danny | 2020-03-01 | 5000 | A |
|
||
| 005 | Eason | 2020-03-01 | 4000 | B |
|
||
| 006 | Fred | 2018-03-01 | 3500 | B |
|
||
| 007 | Gary | 2017-03-01 | 1800 | B |
|
||
| 008 | Hugo | 2020-03-01 | 2500 | B |
|
||
|
||
#### 脚本
|
||
|
||
```sql
|
||
CREATE TABLE SQL_7 (
|
||
empno varchar(4),
|
||
ename varchar(10),
|
||
hire_date varchar(10),
|
||
salary int,
|
||
dept_no varchar(2)
|
||
);
|
||
insert into SQL_7 (empno, ename, hire_date, salary, dept_no) values ('001', 'Adam', '2018-03-01', 1000, 'A');
|
||
insert into SQL_7 (empno, ename, hire_date, salary, dept_no) values ('002', 'Bill', '2021-03-01', 1200, 'A');
|
||
insert into SQL_7 (empno, ename, hire_date, salary, dept_no) values ('003', 'Cindy', '2016-03-01', 1500, 'A');
|
||
insert into SQL_7 (empno, ename, hire_date, salary, dept_no) values ('004', 'Danny', '2020-03-01', 5000, 'A');
|
||
insert into SQL_7 (empno, ename, hire_date, salary, dept_no) values ('005', 'Eason', '2020-03-01', 4000, 'B');
|
||
insert into SQL_7 (empno, ename, hire_date, salary, dept_no) values ('006', 'Fred', '2018-03-01', 3500, 'B');
|
||
insert into SQL_7 (empno, ename, hire_date, salary, dept_no) values ('007', 'Gary', '2017-03-01', 1800, 'B');
|
||
insert into SQL_7 (empno, ename, hire_date, salary, dept_no) values ('008', 'Hugo', '2020-03-01', 4500, 'B');
|
||
|
||
select * from SQL_7;
|
||
```
|
||
|
||
##### 分组内topN
|
||
|
||
问题一:求出每个部门工资最高的前三名员工
|
||
|
||
```sql
|
||
select * from
|
||
(
|
||
select *, row_number() over (partition by dept_no order by salary desc) as rn from SQL_7
|
||
) as tmp
|
||
where rn <= 3;
|
||
```
|
||
|
||
|
||
|
||
##### 汇总分析
|
||
|
||
问题二:计算这些员工的工资占所属部门总工资的百分比
|
||
|
||
```sql
|
||
with
|
||
t1 as (
|
||
select * , sum(salary) over (partition by dept_no) as 'sum_sal' from SQL_7
|
||
),
|
||
t2 as (
|
||
select *, round(salary*100/sum_sal,2) as 'percentage' from t1
|
||
)
|
||
select * from t2;
|
||
```
|
||
|
||
问题三:对各部门员工的工资进行从小到大排序,排名前30%为低层,30%-80%为中层,高于80%为高层,并打上标签
|
||
|
||
```label
|
||
with
|
||
t1 as (
|
||
select * , row_number() over (partition by dept_no order by salary) as cnt,
|
||
count(empno) over (partition by dept_no) as 'sum' from SQL_7
|
||
),
|
||
t2 as (
|
||
select *, round(cnt/sum,2) as 'percentage' from t1
|
||
),
|
||
t3 as (
|
||
select *, case when percentage <= 0.3 then '低层'
|
||
when percentage <= 0.8 then '中层'
|
||
when percentage <= 1 then '高层' end as 'label'
|
||
from t2
|
||
)
|
||
select empno, ename, hire_date, salary, dept_no, label from t3;
|
||
```
|
||
|
||
问题四:统计每年入职总数以及截至本年累计入职总人数(本年总入职人数 + 本年之前所有年的总入职人数之和)
|
||
|
||
```sql
|
||
select year(hire_date) as hire_year, count(empno) as cnt
|
||
from SQL_7
|
||
group by year(hire_date) order by hire_year;
|
||
```
|
||
|
||
```sql
|
||
with t1 as (
|
||
select year(hire_date) as hire_year, count(empno) as cnt from SQL_7 group by year(hire_date) order by hire_year
|
||
)
|
||
|
||
select *, sum(cnt) over(partition by null rows between unbounded preceding and current row) as sum from t1;
|
||
```
|
||
|
||
|
||
|
||
### 五. 技巧
|
||
|
||
1)分组内topN公式
|
||
|
||
```sql
|
||
select * from
|
||
(
|
||
select *, row_number() over (partition by 分组列 order by 比较列) as rn from table
|
||
) as tmp
|
||
where rn <= N;
|
||
```
|
||
|
||
2) 窗口函数 -> 生成辅助列(相当于高级语言的临时变量)
|
||
|
||
3) with 语句 -> 生成临时表(相当于高级语言的局部方法)
|
||
|
||
把复杂的问题拆分成多个子问题并用临时表去表达
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|