PostgreSQL 数据库NULL值的默认排序行为与查询、索引定义规范

背景

在数据库中NULL值是指UNKNOWN的值，不存储任何值，在排序时，它排在有值的行前面还是后面通过语法来指定。

例如

-- 表示null排在有值行的前面  
select * from tbl order by id nulls first;  -- 表示null排在有值行的后面  
select * from tbl order by id nulls last;

同时对于有值行，可以指定顺序排还是倒序排。

-- 表示按ID列顺序排  
select * from tbl order by id [asc];  -- 表示按ID列倒序排  
select * from tbl order by id desc;

默认的排序规则如下：

desc nulls first : null large small    asc nulls last : small large null

当nulls [first|last]与asc|desc组合起来用时，是这样的。

值的顺序如下：

1、DEFAULT:（认为NULL比任意值都大）

desc nulls first : 顺序：null large small    asc nulls last   : 顺序：small large null

2、NON DEFAULT: （认为NULL比任意值都小）

desc nulls last : 顺序：large small null       asc nulls first : 顺序：null small large

由于索引是固定的，当输入排序条件时，如果排序条件与索引的排序规则不匹配时，会导致无法使用索引的实惠（顺序扫描）。导致一些不必要的麻烦。

索引定义与扫描定义不一致引发的问题

1、建表，输入测试数据

create table cc(id int not null);  insert into cc select generate_series(1,1000000);

2、建立索引（使用非默认配置，null比任意值小）

create index idx_cc on cc (id asc nulls first);  或  create index idx_cc on cc (id desc nulls last);

3、查询，与索引定义的顺序（指NULL的相对位置）不一致时，即使使用索引，也需要重新SORT。

select * from table order by id desc nulls first limit 1;   
select * from table order by id [asc] nulls last limit 1;

用到了额外的SORT

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from cc order by id limit 1;  QUERY PLAN                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------  Limit  (cost=27969.43..27969.43 rows=1 width=4) (actual time=263.972..263.972 rows=1 loops=1)  Output: id  Buffers: shared hit=7160  ->  Sort  (cost=27969.43..30469.43 rows=1000000 width=4) (actual time=263.970..263.970 rows=1 loops=1)  Output: id  Sort Key: cc.id  Sort Method: top-N heapsort  Memory: 25kB  Buffers: shared hit=7160  ->  Bitmap Heap Scan on public.cc  (cost=8544.42..22969.42 rows=1000000 width=4) (actual time=29.927..148.733 rows=1000000 loops=1)  Output: id  Heap Blocks: exact=4425  Buffers: shared hit=7160  ->  Bitmap Index Scan on idx_cc  (cost=0.00..8294.42 rows=1000000 width=0) (actual time=29.380..29.380 rows=1000000 loops=1)  Buffers: shared hit=2735  Planning time: 0.098 ms  Execution time: 264.009 ms  
(16 rows)

3、查询，与索引定义一致（指NULL的相对位置）时，索引有效，不需要额外SORT。

select * from table order by id desc nulls last limit 1;   
select * from table order by id [asc] nulls first limit 1;

不需要额外SORT

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from cc order by id nulls first limit 1;  QUERY PLAN                                                                 
---------------------------------------------------------------------------------------------------------------------------------------  Limit  (cost=0.42..0.45 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)  Output: id  Buffers: shared hit=4  ->  Index Only Scan using idx_cc on public.cc  (cost=0.42..22719.62 rows=1000000 width=4) (actual time=0.013..0.013 rows=1 loops=1)  Output: id  Heap Fetches: 1  Buffers: shared hit=4  Planning time: 0.026 ms  Execution time: 0.022 ms  
(9 rows)