MySQL上表分区的试验

问题描述

生产库上关注表(focus),目前单表数据量级去到了1600w条。但很多是无效的数据,实际有用的数据是半年。通过分区的方式来优化这个表,加速数据查询速度。

操作过程思路

新建一张分区表,以flg_date字段来进行分区选择,以一个季度的数据分一个区。然后把旧表的数据导入到新表,修改旧表表名,再把新表改回旧表的名字。
以flg_date字段进行分区,在DAO层的查询中,select共出现18次,查询条件带flg_date的,出现9次。

试验环境(虚拟机)

mysql版本:5.6.40
操作系统:Linux CentOS-7.0
内存:1G
内核:1
硬盘:20G

创建分区表(csmbp_flight_focus_p)

因默认分区限制分区字段必须是主键(PRIMARY KEY)的一部分,需要把id 、flt_date一起设为该表主键,使用联合主键。focus表结构中,有7个索引,1个主键索引,一个普通索引,5个组合索引,新建的分区表,暂不加入旧表中的索引。按季度分区,一次分十年。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
CREATE TABLE `focus_p` (
`id` varchar(64) NOT NULL,
`flt_date` datetime NOT NULL,
PRIMARY KEY (`id`, `flt_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
PARTITION BY RANGE (TO_DAYS(flt_date) ) (
PARTITION focus_part_20180101 VALUES LESS THAN (TO_DAYS('20180101')),
PARTITION focus_part_20180401 VALUES LESS THAN (TO_DAYS('20180401')),
PARTITION focus_part_20180701 VALUES LESS THAN (TO_DAYS('20180701')),
PARTITION focus_part_20181001 VALUES LESS THAN (TO_DAYS('20181001')),
PARTITION focus_part_20190101 VALUES LESS THAN (TO_DAYS('20190101')),
PARTITION focus_part_20190401 VALUES LESS THAN (TO_DAYS('20190401')),
PARTITION focus_part_20190701 VALUES LESS THAN (TO_DAYS('20190701')),
PARTITION focus_part_20191001 VALUES LESS THAN (TO_DAYS('20191001')),
PARTITION focus_part_20190101 VALUES LESS THAN (TO_DAYS('20200101')),
PARTITION focus_part_20190401 VALUES LESS THAN (TO_DAYS('20200401')),
PARTITION focus_part_20190701 VALUES LESS THAN (TO_DAYS('20200701')),
PARTITION focus_part_20191001 VALUES LESS THAN (TO_DAYS('20201001')),
PARTITION focus_part_20190101 VALUES LESS THAN (TO_DAYS('20210101')),
PARTITION focus_part_20190401 VALUES LESS THAN (TO_DAYS('20210401')),
PARTITION focus_part_20190701 VALUES LESS THAN (TO_DAYS('20210701')),
PARTITION focus_part_20191001 VALUES LESS THAN (TO_DAYS('20211001')),
PARTITION focus_part_20190101 VALUES LESS THAN (TO_DAYS('20220101')),
PARTITION focus_part_20190401 VALUES LESS THAN (TO_DAYS('20220401')),
PARTITION focus_part_20190701 VALUES LESS THAN (TO_DAYS('20220701')),
PARTITION focus_part_20191001 VALUES LESS THAN (TO_DAYS('20221001')),
PARTITION focus_part_20190101 VALUES LESS THAN (TO_DAYS('20230101')),
PARTITION focus_part_20190401 VALUES LESS THAN (TO_DAYS('20230401')),
PARTITION focus_part_20190701 VALUES LESS THAN (TO_DAYS('20230701')),
PARTITION focus_part_20191001 VALUES LESS THAN (TO_DAYS('20231001')),
PARTITION focus_part_20190101 VALUES LESS THAN (TO_DAYS('20240101')),
PARTITION focus_part_20190401 VALUES LESS THAN (TO_DAYS('20240401')),
PARTITION focus_part_20190701 VALUES LESS THAN (TO_DAYS('20240701')),
PARTITION focus_part_20191001 VALUES LESS THAN (TO_DAYS('20241001')),
PARTITION focus_part_20190101 VALUES LESS THAN (TO_DAYS('20250101')),
PARTITION focus_part_20190401 VALUES LESS THAN (TO_DAYS('20250401')),
PARTITION focus_part_20190701 VALUES LESS THAN (TO_DAYS('20250701')),
PARTITION focus_part_20191001 VALUES LESS THAN (TO_DAYS('20251001')),
PARTITION focus_part_20190101 VALUES LESS THAN (TO_DAYS('20260101')),
PARTITION focus_part_20190401 VALUES LESS THAN (TO_DAYS('20260401')),
PARTITION focus_part_20190701 VALUES LESS THAN (TO_DAYS('20260701')),
PARTITION focus_part_20191001 VALUES LESS THAN (TO_DAYS('20261001')),
PARTITION focus_part_20190101 VALUES LESS THAN (TO_DAYS('20270101')),
PARTITION focus_part_20190401 VALUES LESS THAN (TO_DAYS('20270401')),
PARTITION focus_part_20190701 VALUES LESS THAN (TO_DAYS('20270701')),
PARTITION focus_part_20191001 VALUES LESS THAN (TO_DAYS('20271001')),
PARTITION focus_part_20190101 VALUES LESS THAN (TO_DAYS('20280101')),
PARTITION focus_part_20190401 VALUES LESS THAN (TO_DAYS('20280401')),
PARTITION focus_part_20190701 VALUES LESS THAN (TO_DAYS('20280701')),
PARTITION focus_part_20191001 VALUES LESS THAN (TO_DAYS('20281001')),
PARTITION focus_part_20190101 VALUES LESS THAN (TO_DAYS('20290101')),
PARTITION focus_part_20190401 VALUES LESS THAN (TO_DAYS('20290401')),
PARTITION focus_part_20190701 VALUES LESS THAN (TO_DAYS('20290701')),
PARTITION focus_part_20191001 VALUES LESS THAN (TO_DAYS('20291001'))
);

查看分区表

1
2
3
4
5
6
7
8
9
10
SELECT
partition_name part,
partition_expression expr,
partition_description descr,
table_rows
FROM
INFORMATION_SCHEMA.partitions
WHERE
TABLE_SCHEMA = schema()
AND TABLE_NAME='focus_p';

添加测试数据(focus)

使用存储过程插入非空字段的数据,插入1600W条数据。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
delimiter //
SET autocommit=0;
CREATE PROCEDURE load_part_tab()
BEGIN
DECLARE v int DEFAULT 0;
WHILE v < 16000000
DO
INSERT INTO csmbp_flight_focus(id,flt_date)
VALUES (v,adddate('2013-01-01 00:00:00',(rand(v)*36520) mod 3652));
SET v = v + 1;
END WHILE;
COMMIT;
END
//
delimiter ;
CALL load_part_tab();

旧表数据转移至新的分区表

1
INSERT INTO focus_p SELECT * FROM focus;

总耗时12分6秒。

再次查看分区的存储记录情况:

验证查询对比

  • 查focus表:耗时2分54秒
    1
    2
    SELECT * FROM focus f WHERE f.flt_date >= '2017-12-16 00:00:00'
    AND f.flt_date <= '2018-11-08 00:00:00';

  • 查focus_p表:耗时28秒
    1
    2
    SELECT * FROM focus_p f WHERE f.flt_date >= '2017-12-16 00:00:00'
    AND f.flt_date <= '2018-11-08 00:00:00';

分区管理

  • 删除分区:ALTER TABLE focus_p DROP PARTITION p1;
  • 分区合并:ALTER TABLE focus_p REORGANIZE PARTITION p1,p2 INTO (PARTITION p1 VALUES LESS THAN (TO_DAYS(‘20180401’)));
  • 新增分区:ALTER TABLE focus_p ADD PARTITION (PARTITION p_12 VALUES LESS THAN (TO_DAYS(‘20180401’)));
  • 新增分区注意:因原先最后一个分区是PARTITION p_11 VALUES LESS THAN MAXVALUE; 那么应该先合并一个分区再删除被合并的分区,然后在执行新增分区语句,分区名就用被合并的那个分区名;

修改表名

先修改旧表名,再把旧表名修改成新表名。

1
2
ALTER TABLE focus RENAME focus_old;
ALTER TABLE focus_p RENAME focus;

存在的风险

  1. 数据完整性的保障和耗时,存在一定的不可控。在把数据转移至新分区表时,服务一直在使用,就会出现在转移数据到完成数据转移的时间间隔中,出现新增数据在旧表上,而新表没有的情况。可以考虑在新表插入完旧表的数据后,做一个count(*)操作,先记录新表刚开始有多少数据量,再新表投入使用后,再count一下旧表的数据,count旧表减去count新表,代表之间的差值,用mysql提供的limit语法,把剩余的差值转移至分区表中。select * from table limit m,n。m表示指定从第几条数据开始取,n表示取多少条数据。因m的数值是从0开始的,所以m=count(*)即可,不用减1。从旧表中指定位置开始导入,n可以取旧表与新表之间的差值。整个数据转移的过程中,测试实验出的时间是总耗时12分6秒,生产环境还是会存在不确定性。
  2. 在新表表名替换过程中,服务不中断的情况下,有可能出现,已有事务正在向旧表提交请求。可以考虑不更改表名的操作,而在工程项目中的代码中,更改实体类所映射的表名。
  3. 新分区表是否需要加上旧表中的索引。
  4. 旧表数据是否需要删除。不建议删除,旧表数据的带有7个索引,DROP操作会产生较长的耗时,具体未验证过。可以考虑把它作为一个备份数据存在。

开启mysql定时任务

检查event事件是否开启:SHOW VARIABLES LIKE ‘event_scheduler’;

OFF则需要开启:SET GLOBAL event_scheduler = ON;

创建定时创建分区SQL

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- 设置该事件使用或所属的数据库
USe test;
# 如果原来存在该名字的任务计划则先删除
DROP EVENT IF EXISTS create_part;
# 设置分隔符为 '//'
DELIMITER //
# 创建计划任务,设置第一次执行时间为'2019-01-01 00:00:00',并且每年执行一次
CREATE EVENT create_part ON SCHEDULE EVERY 1 YEAR STARTS TIMESTAMP '2019-01-01 00:00:00'
DO
# 开始该计划任务(每一年自动生成下一年的分区,按一个季度分一个区)
BEGIN
DECLARE v INT DEFAULT 0;
WHILE v < 4
DO
-- 分区的日期限制
SET @p_date = (SELECT DATE_ADD(curdate() - DAY(curdate()) + 1,INTERVAL (12 + 3*v) MONTH));
-- 分区的名
SET @p_name = (SELECT concat('focus_part_',date_format(@p_date,'%Y%m%d'))FROM DUAL);
-- 添加分区sql语句
SET @p_sql = concat('ALTER TABLE focus_p ADD PARTITION(PARTITION ',@p_name,' VALUES LESS THAN (TO_DAYS(\'',@p_date,'\')))');
-- 预处理sql,其中stmt是一个变量
PREPARE stmt FROM @p_sql;
-- 执行SQL语句
EXECUTE stmt;
-- 释放掉预处理段
DEALLOCATE PREPARE stmt;
SET v = v + 1;
END WHILE;
COMMIT;
-- 结束计划任务
END
//
# 将语句分割符设置回 ';'
DELIMITER ;
#查询事件
SELECT * FROM mysql.event;
#停止事件
ALTER EVENT create_part DISABLE;
#开启事件
ALTER EVENT create_part ENABLE;

资源使用情况

  • 空闲时:

  • 执行数据转移时:CPU使用率在30%-70%之间,内存占用率40%左右

谢谢你请我吃糖果

--------- 本文结束,感谢您的审阅 ---------
0%