MySQL optimization tips

2024-08-24

子查询的效率不高的原因

执行时，mysql需要为内层查询语句的查询结果建立一个临时表，然后外层查询语句从临时表中查询记录。查询完再撤销这些临时表，这样会消耗过多的CPU和IO，产生大量的慢查询。
临时表不会存在索引，所以查询性能会受到一定的影响。
对于返回结果集比较大的子查询，其对查询性能的影响也比较大。

结论是

不建议使用子查询，建议拆成多个查询，或者使用JOIN来替代子查询。join不需要建立临时表，速度比子查询快，还能使用索引。
尽量不要使用NOT IN或者NOT EXISTS，用LEFT JOIN ** ON ** WHERE ** IS NULL替代

排序优化

假设student表中有大量数据（我懒得创建了），创建联合索引（age, classid, name）

explain select * from student order by age, classid;

结果发现这条没有使用索引，为什么？因为数据量很大的时候，如果使用索引他需要大量的回表操作，然后查询优化器就决定干脆不用索引了。

explain select age,classid from student order by age, classid;

不选择全部字段，只选择索引中的字段，那么就不需要回表了，这个时候就发现使用了索引。

explain select * from student order by age, classid limit 10;

加上limit后，发现使用上了索引，原因还是之前说的，限制10条减少了回表的数据量，因此优化器使用上了索引。

需要注意⚠️：这块讨论的情况，如果数据量不一样了，结果也可能不一样。

联合索引的一些规则

比如上述的（age, classid, name），如果order by classid来查，是不会使用到索引的，需要order by age,classid才行。
升序降序不一致也无法使用到索引。

select ... order by age desc,classid asc limit 10; -- 🙅
select ... order by age asc,classid desc limit 10; -- 🙅
select ... order by age desc,classid desc limit 10; -- ✅

无过滤，不索引

explain select * from student where age=45 order by classid; -- 使用了索引，但是key_len看出其实只使用了age
explain select * from student where age=45 order by classid,name;-- 同上

-- 没有用上索引，因为他可能想如果先按照age来排好，然后再去找classid，回表成本高
explain select * from student where classid=45 order by age; 

-- 用上了索引，因为回表数据量低
explain select * from student where classid=45 order by age limit 10;

总结 INDEX a_b_c(a,b,c)

order by 能使用最做前缀

order by a
order by a,b
order by a,b,c
order by a desc, b desc, c desc

如果where使用索引的最做左前缀定义为常量，则order by能使用索引

where a = const order by b, c
where a = const and b = const order by c
where a = const and b > const order by b, c

不能使用索引进行排序

order by a asc, b desc, c desc -- 排序不一致
where g = const order by b, c -- 没有a
where a = const order by c -- 没有b
where a = const order by a, d -- d不是索引的一部分
where a in (...) order by b, c -- 范围不行

order by时使用select *是一个大忌，最好只query需要的字段。原因：

当query的字段大小总和小于max_length_for_sort_data，而且排序字段不是text|blob类型时，会改用改进后的算法——单路排序，否则用老算法——多路排序
两种算法都有可能超出sort_buffer_size的容量，超出之后，会创建tmp文件进行合并排序，导致多次io，但是用单路排序的风险更大一点，所以要提高sort_buffer_size

group by优化

group by使用索引的原则几乎跟order by一致
where效率高于having，能写在where里面就尽量不要写在having里
减少使用order by，和业务沟通能不排序就不排序，或者排序放到程序中去做。order by, group by, distinct这些语句较为消耗cpu，数据库的cpu资源是极其宝贵的
包含了order by, group by, distinct的语句，where条件过滤出来的结果请保持在1000行以内，否则sql会很慢

分页查询优化

案例：

select * from students limit 2000000, 10;

此时需要排序前2000010，然后仅仅返回后面的10条，查询代价非常大。因为是select *，所以要尽量使用上主键索引

优化思路1 在索引上完成排序分页操作，然后根据主键关联回表查询所需要的其他列内容

select * from students t, (select id from students order by id limit 2000000, 10) a where t.id = a.id;

优化思路2 适用于主键自增的表，可以把limit查询转换成某个位置的查询

select * from students where id > 2000000 limit 10;