PostgreSQL中索引是否存储空值?
偶然在PostgreSQL官方文档上看到这句话:an IS NULL or IS NOT NULL condition on an index column can be used with a B-Tree index。
is not null好理解,建上索引可以走,但是is null竟然也可以走索引。据我所知,在oracle里索引是不存储null值的,所以is null走不了索引,在pg里is null可以走索引,说明null值在索引里面也进行了存储。下面分别对pg和oracle进行测试验证。
在pg和oracle中分别创建test表,初始化数据
test=# create table test(c1 int,c2 int default null);
CREATE TABLE
test=# insert into test values(1,1);
INSERT 0 1
test=# insert into test select * from test;
INSERT 0 1
test=# insert into test select * from test;
INSERT 0 2
test=# insert into test select * from test;
INSERT 0 4
...
test=# insert into test select * from test;
INSERT 0 2097152
test=# select count(*) from test;
count
---------
4194304
(1 row)
再插入一行,c2为空值
test=# insert into test(c1) values(2);
INSERT 0 1
c2列创建索引,收集统计信息
pg
test=# create index on test(c2);
CREATE INDEX
test=# analyze test;
ANALYZE
oracle
SQL> create index idx_test_c2 on test(c2);
Index created.
SQL> exec dbms_stats.gather_table_stats(ownname=>'SYS',tabname=>'TEST',method_opt=>'for all indexed columns');
PL/SQL procedure successfully completed.
执行查询查看执行计划
pg
test=# explain select * from test where c2 is null;
QUERY PLAN
------------------------------------------------------------------------
Index Scan using test_c2_idx on test (cost=0.43..4.45 rows=1 width=8)
Index Cond: (c2 IS NULL)
rows)
oracle
SQL> explain plan for select * from test where c2 is null;
Explained.
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 217508114
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 1769 (3)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST | 1 | 6 | 1769 (3)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
1 - filter("C2" IS NULL)
13 rows selected.
从上面执行计划对比可以看到pg走了索引,oracle没走索引,因此也验证了pg的btree索引是可以存储空值的。笔者也验证过mysql的btree索引也是存储空值的。
其实这引出来一个问题:索引到底应不应该存储空值?其实我个人觉得不应该存储,oracle里索引不存储null值应该也是经过考虑后做的优化。因为在实际业务场景下,某个字段is null这一类的查询基本不会出现,没有实际意义,而且null值在实际场景里面会很多,很多字段都可能是null,如果这些null值都在索引键里面都进行存储,那么大大增加了索引的大小,降低了索引扫描的效率,所以把null值排除在索引之外是一个优化,也希望未来pg能将这个功能引入。
评论