【MySQL】联合索引的使用

1、背景

联合索引就是给多个列建一个索引，使用联合索引时要满足最左匹配原则，不然会索引失效，本篇文章就通过explain执行计划研究一下联合索引，能让我们避免使用联合索引的一些坑。

2、数据示例

创建一张表，有三个索引：主键索引、普通索引、联合索引，我们重点关注联合索引：

CREATE TABLE test1
(id INT AUTO_INCREMENT PRIMARY KEY,str1 VARCHAR(255) NOT NULL DEFAULT '',str2 VARCHAR(255),str3 CHAR(5),str4 VARCHAR(255),str5 CHAR(10),INDEX idx_str1 (str1),INDEX idx_str4_str5 (str4, str5)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4;

插入100条数据：

INSERT INTO test1 (str1, str2, str3, str4, str5) VALUES('value1', 'data1', 'abc', 'value4_1', 'value5_1'),('value2', 'data2', 'def', 'value4_2', 'value5_2'),('value3', 'data3', 'ghi', 'value4_3', 'value5_3'),('value4', 'data4', 'jkl', 'value4_4', 'value5_4'),('value5', 'data5', 'mno', 'value4_5', 'value5_5'),('value6', 'data6', 'pqr', 'value4_6', 'value5_6'),('value7', 'data7', 'stu', 'value4_7', 'value5_7'),('value8', 'data8', 'vwx', 'value4_8', 'value5_8'),('value9', 'data9', 'yz1', 'value4_9', 'value5_9'),('value10', 'data10', 'yz2', 'value4_10', 'value5_10'),('value11', 'data11', 'yz3', 'value4_11', 'value5_11'),('value12', 'data12', 'yz4', 'value4_12', 'value5_12'),....('value98', 'data98', 'yz90', 'value4_98', 'value5_98'),('value99', 'data99', 'yz91', 'value4_99', 'value5_99'),('value100', 'data100', 'yz92', 'value4_100', 'value5_100');

3、联合索引B+树结构

可以用如下图来表示联合索引的B+树结构：

在这里插入图片描述
解释一下上面的图：

1、蓝色部分表示InnoDB的基本存储单位"页"，页上的绿色部分代表目录项记录或者用户记录。
2、从上往下，第一层是非叶子节点，每个页上存储目录项记录，第二层是叶子节点，每个页上存储的是目录项记录。
3、目录项记录或用户记录会存储联合索引的str4列、str5列、主键id列。
4、同一个页上记录根据str4列和str5列的大小从左往右顺序存储，同一层叶子节点或非叶子节点的所有页也是根据str4列和str5列的大小从左往右顺序存储，并且这些页组成一个双向链表。
5、联合索引的大小规则为先按照最左边的列str4的大小排列，在列str4相同的情况下再根据列str5进行排列。

4、联合索引的几种使用方式

【1】全值匹配

查询条件将联合索引中所有列都用到了就叫全值匹配，例如：

select * from test1 where str4=‘value4_32’ and str5=‘value5_32’;

mysql [xxx]> explain select * from test1 where str4='value4_32' and str5='value5_32';
+------+-------------+-------+------+---------------+---------------+---------+-------------+------+-----------------------+
| id   | select_type | table | type | possible_keys | key           | key_len | ref         | rows | Extra                 |
+------+-------------+-------+------+---------------+---------------+---------+-------------+------+-----------------------+
|    1 | SIMPLE      | test1 | ref  | idx_str4_str5 | idx_str4_str5 | 1064    | const,const | 1    | Using index condition |
+------+-------------+-------+------+---------------+---------------+---------+-------------+------+-----------------------+
1 row in set (0.001 sec)

可以看到组合索引被使用了，我们将查询条件的列str4和列str5调换位置，组合索引是否可以用到呢：

select * from test1 where str5=‘value4_32’ and str4=‘value5_32’;

mysql [xxx]> explain select * from test1 where str5='value4_32' and str4='value5_32';
+------+-------------+-------+------+---------------+---------------+---------+-------------+------+-----------------------+
| id   | select_type | table | type | possible_keys | key           | key_len | ref         | rows | Extra                 |
+------+-------------+-------+------+---------------+---------------+---------+-------------+------+-----------------------+
|    1 | SIMPLE      | test1 | ref  | idx_str4_str5 | idx_str4_str5 | 1064    | const,const | 1    | Using index condition |
+------+-------------+-------+------+---------------+---------------+---------+-------------+------+-----------------------+
1 row in set (0.001 sec)

可以看到组合索引依然被使用，这是因为sql执行过程中的优化器会将sql进行优化，优化之后就会优先使用列str4去查询记录。

【2】部分列匹配

只使用联合索引中的部分列作为查询条件，例如：

select * from test1 where str4=‘value5_32’;

mysql [xxx]> explain select * from test1 where str4='value4_32';
+------+-------------+-------+------+---------------+---------------+---------+-------+------+-----------------------+
| id   | select_type | table | type | possible_keys | key           | key_len | ref   | rows | Extra                 |
+------+-------------+-------+------+---------------+---------------+---------+-------+------+-----------------------+
|    1 | SIMPLE      | test1 | ref  | idx_str4_str5 | idx_str4_str5 | 1023    | const | 1    | Using index condition |
+------+-------------+-------+------+---------------+---------------+---------+-------+------+-----------------------+
1 row in set (0.001 sec)

可以看到使用最左边的列str4用到了联合索引，再试试只使用右边的列str5作为查询条件：

select * from test1 where str5=‘value5_32’;

mysql [xxx]> explain select * from test1 where str5='value5_32';
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
|    1 | SIMPLE      | test1 | ALL  | NULL          | NULL | NULL    | NULL | 100  | Using where |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.001 sec)

使用右边的列就没用到联合索引，并且rows为100，会进行全表扫描。

所以在使用联合索引的列作为查询条件时，使用最左边连续的列作为查询条件才能用到联合索引。

【3】列前缀匹配

前缀匹配可以分为：左前缀、右前缀、中间部分。模糊查询只有左前缀会生效，我们依次看看这三种情况是否使用到索引。先看左前缀：

select * from test1 where str4 like ‘xxx%’;

mysql [xxx]> explain select * from test1 where str4 like 'xxx%';
+------+-------------+-------+-------+---------------+---------------+---------+------+------+-----------------------+
| id   | select_type | table | type  | possible_keys | key           | key_len | ref  | rows | Extra                 |
+------+-------------+-------+-------+---------------+---------------+---------+------+------+-----------------------+
|    1 | SIMPLE      | test1 | range | idx_str4_str5 | idx_str4_str5 | 1023    | NULL | 1    | Using index condition |
+------+-------------+-------+-------+---------------+---------------+---------+------+------+-----------------------+
1 row in set (0.001 sec)

可以看到左前缀用到了联合索引，再看右前缀：

select * from test1 where str4 like ‘%xxx’;

mysql [xxx]> explain select * from test1 where str4 like '%xxx';
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
|    1 | SIMPLE      | test1 | ALL  | NULL          | NULL | NULL    | NULL | 100  | Using where |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.000 sec)

可以看到全表扫描了，再看中间匹配：

explain select * from test1 where str4 like ‘%xxx%’;

mysql [xxx]> explain select * from test1 where str4 like '%xxx%';
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
|    1 | SIMPLE      | test1 | ALL  | NULL          | NULL | NULL    | NULL | 100  | Using where |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.000 sec)

也是全表扫描了。

左前缀匹配不一定会使用到索引，也有可能会全表扫描，这是因为有时候优化器认为通过索引查询的开销比全表扫描开销还要大，因为索引查找到的数据回表的也是有开销的，具体细节后面再讲。

【4】范围匹配

只有使用最左边的列进行范围匹配才会生效，例如：

select * from test1 where str4 > ‘value4_56’ and str4 < ‘value4_78’;

mysql [xxx]> explain select * from test1 where str4 > 'value4_56' and str4 < 'value4_78';
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
|    1 | SIMPLE      | test1 | ALL  | idx_str4_str5 | NULL | NULL    | NULL | 100  | Using where |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.002 sec)

可以看到可能会使用到联合索引，但是最后使用了全表扫描，因为优化器认为全表扫描更快，这个时候联合索引就失效了，我们修改一下select *为select str4，也就是不需要回表，再看看是否使用到索引：

select str4 from test1 where str4 > ‘value4_56’ and str4 < ‘value4_78’;

mysql [xxx]> explain select str4 from test1 where str4 > 'value4_56' and str4 < 'value4_78';
+------+-------------+-------+-------+---------------+---------------+---------+------+------+--------------------------+
| id   | select_type | table | type  | possible_keys | key           | key_len | ref  | rows | Extra                    |
+------+-------------+-------+-------+---------------+---------------+---------+------+------+--------------------------+
|    1 | SIMPLE      | test1 | range | idx_str4_str5 | idx_str4_str5 | 1023    | NULL | 23   | Using where; Using index |
+------+-------------+-------+-------+---------------+---------------+---------+------+------+--------------------------+
1 row in set (0.001 sec)

可以看到成功使用了联合索引，当左边的列精准匹配，右边的列范围匹配时也可能用到联合索引，例如：

select * from test1 where str4 = ‘value4_56’ and str5 < ‘value5_80’;

mysql [xxx]> explain select * from test1 where str4 = 'value4_56' and str5 < 'value5_80';
+------+-------------+-------+-------+---------------+---------------+---------+------+------+-----------------------+
| id   | select_type | table | type  | possible_keys | key           | key_len | ref  | rows | Extra                 |
+------+-------------+-------+-------+---------------+---------------+---------+------+------+-----------------------+
|    1 | SIMPLE      | test1 | range | idx_str4_str5 | idx_str4_str5 | 1064    | NULL | 1    | Using index condition |
+------+-------------+-------+-------+---------------+---------------+---------+------+------+-----------------------+
1 row in set (0.001 sec)

可以看到联合索引被使用了。

【5】排序

根据索引列进行排序也是有可能用到索引的，例如：

select * from test1 order by str4,str5;

mysql [xxx]> explain select * from test1 order by str4,str5;
+------+-------------+-------+------+---------------+------+---------+------+------+----------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra          |
+------+-------------+-------+------+---------------+------+---------+------+------+----------------+
|    1 | SIMPLE      | test1 | ALL  | NULL          | NULL | NULL    | NULL | 100  | Using filesort |
+------+-------------+-------+------+---------------+------+---------+------+------+----------------+
1 row in set (0.000 sec)

可以看到根据str4和str5排序进行了全表扫描，我们限制一下数量：

select * from test1 order by str4,str5 limit 1;

MariaDB [mng]> explain select * from test1 order by str4,str5 limit 1;
+------+-------------+-------+-------+---------------+---------------+---------+------+------+-------+
| id   | select_type | table | type  | possible_keys | key           | key_len | ref  | rows | Extra |
+------+-------------+-------+-------+---------------+---------------+---------+------+------+-------+
|    1 | SIMPLE      | test1 | index | NULL          | idx_str4_str5 | 1064    | NULL | 1    |       |
+------+-------------+-------+-------+---------------+---------------+---------+------+------+-------+
1 row in set (0.000 sec)

可以看到使用到了联合索引。

还有几种情况会用不到联合索引，写个sql带上explain验证一下就行了：
1、AESC和DESC混用。
2、非联合索引精确匹配，联合索引列范围查询。
3、排序的多个列包含非联合索引的列。
4、使用了表达式。

【6】分组

分组也可能会用到联合索引，例如:

select str4,str5,count(*) from test1 group by str4,str5;

mysql [xxx]> explain select str4,str5,count(*) from test1 group by str4,str5;
+------+-------------+-------+-------+---------------+---------------+---------+------+------+-------------+
| id   | select_type | table | type  | possible_keys | key           | key_len | ref  | rows | Extra       |
+------+-------------+-------+-------+---------------+---------------+---------+------+------+-------------+
|    1 | SIMPLE      | test1 | index | NULL          | idx_str4_str5 | 1064    | NULL | 100  | Using index |
+------+-------------+-------+-------+---------------+---------------+---------+------+------+-------------+
1 row in set (0.001 sec)