
原先有按月分的 12 张表,每张表里大概有 2000w 左右的数据
... CREATE TABLE `parcel_with_no_partition_m05` ( ... ) CREATE TABLE `parcel_with_no_partition_m06` ( `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '数据主键,毫无意义', `equip_id` int(11) DEFAULT NULL, `scan_id` int(11) DEFAULT NULL, `create_datetime` datetime DEFAULT NULL, `minute_of_hour` int(11) DEFAULT NULL, `hour_of_day` int(11) DEFAULT NULL, `day_of_month` int(11) DEFAULT NULL, `week_of_year` int(11) DEFAULT NULL, `month_of_year` int(11) DEFAULT NULL, `parcel_length` int(11) DEFAULT NULL PRIMARY KEY (`id`) ) ENGINE=innodb AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; CREATE TABLE `parcel_with_no_partition_m07` ( ... ) ... 现在想尝试用 mysql8 的 partition 来进行分表,集中存储了 12 个月的一共 2 亿多条数据,每个分区 2000w 条左右的数据,与之前的表数据一致
CREATE TABLE `parcel_with_partition` ( `equip_id` int(11) DEFAULT NULL, `scan_id` int(11) DEFAULT NULL, `create_datetime` datetime DEFAULT NULL, `minute_of_hour` int(11) DEFAULT NULL, `hour_of_day` int(11) DEFAULT NULL, `day_of_month` int(11) DEFAULT NULL, `week_of_year` int(11) DEFAULT NULL, `month_of_year` int(11) DEFAULT NULL, `parcel_length` int(11) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ENGINE=InnoDB PARTITION BY RANGE(`month_of_year`) ( PARTITION `JAN` VALUES LESS THAN (2), PARTITION `FEB` VALUES LESS THAN (3), PARTITION `MAR` VALUES LESS THAN (4), PARTITION `APR` VALUES LESS THAN (5), PARTITION `MAY` VALUES LESS THAN (6), PARTITION `JUN` VALUES LESS THAN (7), PARTITION `JUL` VALUES LESS THAN (8), PARTITION `AUG` VALUES LESS THAN (9), PARTITION `SEP` VALUES LESS THAN (10), PARTITION `OCT` VALUES LESS THAN (11), PARTITION `NOV` VALUES LESS THAN (12), PARTITION `DEC` VALUES LESS THAN (13) ); 我尝试执行了一条业务 sql,发现效率下降了一倍
#从 mysql 自带的分区中取数据 select sql_no_cache count(*) as num,`day_of_month` as day,`hour_of_day` as hour,`scan_id` as scanId from `parcel_with_partition` partition(`JUN`) where `equip_id` = 3 and `create_datetime` >'2020-06-01' and `create_datetime` <= '2020-06-29' group by `scan_id`,`day_of_month`,`hour_of_day` 2018 rows in set (59.56 sec) #从原始分表中取数据 select sql_no_cache count(*) as num,`day_of_month` as day,`hour_of_day` as hour,`scan_id` as scanId from `parcel_with_no_partition_m06` where `equip_id` = 3 and `create_datetime` >'2020-06-01' and `sort_datetime` <= '2020-06-29' group by `scan_id`,`day_of_month`,`hour_of_day` 2018 rows in set, 1 warning (29.13 sec) 又进一步用 explain 和 performance schema 分别观察了两种 sql
mysql 自带分区查询详情
explain: +----+-------------+-----------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+ | 1 | SIMPLE | parcel_with_partition | JUN | ALL | NULL | NULL | NULL | NULL | 20481855 | 1.11 | Using where; Using temporary | +----+-------------+-----------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+ table i/o: select * from table_io_waits_summary_by_table where count_star>0 \G; *************************** 1. row *************************** OBJECT_TYPE: TABLE OBJECT_SCHEMA: test OBJECT_NAME: parcel_with_partition COUNT_STAR: 22123040 SUM_TIMER_WAIT: 55709022384855 MIN_TIMER_WAIT: 55709022384855 AVG_TIMER_WAIT: 2517879 MAX_TIMER_WAIT: 55709022384855 COUNT_READ: 22123040 SUM_TIMER_READ: 55709022384855 MIN_TIMER_READ: 55709022384855 AVG_TIMER_READ: 2517879 MAX_TIMER_READ: 55709022384855 COUNT_WRITE: 0 SUM_TIMER_WRITE: 0 MIN_TIMER_WRITE: 0 AVG_TIMER_WRITE: 0 MAX_TIMER_WRITE: 0 COUNT_FETCH: 22123040 SUM_TIMER_FETCH: 55709022384855 MIN_TIMER_FETCH: 55709022384855 AVG_TIMER_FETCH: 2517879 MAX_TIMER_FETCH: 55709022384855 COUNT_INSERT: 0 SUM_TIMER_INSERT: 0 MIN_TIMER_INSERT: 0 AVG_TIMER_INSERT: 0 MAX_TIMER_INSERT: 0 COUNT_UPDATE: 0 SUM_TIMER_UPDATE: 0 MIN_TIMER_UPDATE: 0 AVG_TIMER_UPDATE: 0 MAX_TIMER_UPDATE: 0 COUNT_DELETE: 0 SUM_TIMER_DELETE: 0 MIN_TIMER_DELETE: 0 AVG_TIMER_DELETE: 0 MAX_TIMER_DELETE: 0 file i/o: FILE_NAME: /home/wedo/mysql/data/mysql/test/parcel_with_partition#p#jun.ibd EVENT_NAME: wait/io/file/innodb/innodb_data_file OBJECT_INSTANCE_BEGIN: 140081009323904 COUNT_STAR: 422002 SUM_TIMER_WAIT: 3965120084812 MIN_TIMER_WAIT: 5739112 AVG_TIMER_WAIT: 9395831 MAX_TIMER_WAIT: 498159493 COUNT_READ: 422002 SUM_TIMER_READ: 3965120084812 MIN_TIMER_READ: 5739112 AVG_TIMER_READ: 9395831 MAX_TIMER_READ: 498159493 SUM_NUMBER_OF_BYTES_READ: 6914080768 COUNT_WRITE: 0 SUM_TIMER_WRITE: 0 MIN_TIMER_WRITE: 0 AVG_TIMER_WRITE: 0 MAX_TIMER_WRITE: 0 SUM_NUMBER_OF_BYTES_WRITE: 0 COUNT_MISC: 0 SUM_TIMER_MISC: 0 MIN_TIMER_MISC: 0 AVG_TIMER_MISC: 0 MAX_TIMER_MISC: 0 原始分表查询详情
explain: +----+-------------+-----------------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+ | 1 | SIMPLE | parcel_with_no_partition_m06 | NULL | ALL | NULL | NULL | NULL | NULL | 22123040 | 1.11 | Using where; Using temporary | +----+-------------+------------------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+ table i/o: OBJECT_TYPE: TABLE OBJECT_SCHEMA: test OBJECT_NAME: parcel_with_no_partition_m06 COUNT_STAR: 22123040 SUM_TIMER_WAIT: 23182194259854 MIN_TIMER_WAIT: 23182194259854 AVG_TIMER_WAIT: 1047813 MAX_TIMER_WAIT: 23182194259854 COUNT_READ: 22123040 SUM_TIMER_READ: 23182194259854 MIN_TIMER_READ: 23182194259854 AVG_TIMER_READ: 1047813 MAX_TIMER_READ: 23182194259854 COUNT_WRITE: 0 SUM_TIMER_WRITE: 0 MIN_TIMER_WRITE: 0 AVG_TIMER_WRITE: 0 MAX_TIMER_WRITE: 0 COUNT_FETCH: 22123040 SUM_TIMER_FETCH: 23182194259854 MIN_TIMER_FETCH: 23182194259854 AVG_TIMER_FETCH: 1047813 MAX_TIMER_FETCH: 23182194259854 COUNT_INSERT: 0 SUM_TIMER_INSERT: 0 MIN_TIMER_INSERT: 0 AVG_TIMER_INSERT: 0 MAX_TIMER_INSERT: 0 COUNT_UPDATE: 0 SUM_TIMER_UPDATE: 0 MIN_TIMER_UPDATE: 0 AVG_TIMER_UPDATE: 0 MAX_TIMER_UPDATE: 0 COUNT_DELETE: 0 SUM_TIMER_DELETE: 0 MIN_TIMER_DELETE: 0 AVG_TIMER_DELETE: 0 MAX_TIMER_DELETE: 0 file i/o: FILE_NAME: /home/wedo/mysql/data/mysql/test/parcel_with_no_partition_m06.ibd EVENT_NAME: wait/io/file/innodb/innodb_data_file OBJECT_INSTANCE_BEGIN: 140081009372480 COUNT_STAR: 424687 SUM_TIMER_WAIT: 4045883911263 MIN_TIMER_WAIT: 5301220 AVG_TIMER_WAIT: 9526557 MAX_TIMER_WAIT: 861143891 COUNT_READ: 424687 SUM_TIMER_READ: 4045883911263 MIN_TIMER_READ: 5301220 AVG_TIMER_READ: 9526557 MAX_TIMER_READ: 861143891 SUM_NUMBER_OF_BYTES_READ: 6958071808 COUNT_WRITE: 0 SUM_TIMER_WRITE: 0 MIN_TIMER_WRITE: 0 AVG_TIMER_WRITE: 0 MAX_TIMER_WRITE: 0 SUM_NUMBER_OF_BYTES_WRITE: 0 COUNT_MISC: 0 SUM_TIMER_MISC: 0 MIN_TIMER_MISC: 0 AVG_TIMER_MISC: 0 MAX_TIMER_MISC: 0 在 explain 和 file i/o 中都没有看出来异常情况,但是在 table i/o 时 内部分区 使用的时间远高于 原始的分表
这是不是说明多浪费的时间的罪魁祸首是逻辑 i/o 部分?为什么分表后会出现这样的结果呢?希望大神们多多指教
补充一下环境: 系统:centos7 存储引擎:mysql8 服务器配置:12 核 2.5Ghz 64G 内存 硬盘:机械硬盘 在执行查询 sql 时最大的内存占用 3G
1 xuanbg 2020-06-17 23:36:50 +08:00 2000 万和 2.4 亿能一样? PARTITION BY RANGE 实际上还是 1 张表,只不过数据分开存罢了。 |
2 ajsonx 2020-06-18 00:02:23 +08:00 8 的窗口函数我试过几个,主要是为了编写方便吧。试过 1 千万的数据,性能上都不如写联表查询。 |
3 Pendragon OP @xuanbg 我最开始也是这么想的,但是分析结果显示两种查询方法的硬盘 i/o 时间是一样的,占用的最大内存也是单张分表 /分区的大小,所以分区方式应该并没有把 2.4 亿的数据全部取出来 |
4 mahone3297 2020-06-18 00:18:12 +08:00 #从 mysql 自带的分区中取数据 select sql_no_cache count(*) as num,`day_of_month` as day,`hour_of_day` as hour,`scan_id` as scanId from `parcel_with_partition` partition(`JUN`) where `equip_id` = 3 and `create_datetime` >'2020-06-01' and `create_datetime` <= '2020-06-29' group by `scan_id`,`day_of_month`,`hour_of_day` 2018 rows in set (59.56 sec) #从原始分表中取数据 select sql_no_cache count(*) as num,`day_of_month` as day,`hour_of_day` as hour,`scan_id` as scanId from `parcel_with_no_partition_m06` where `equip_id` = 3 and `create_datetime` >'2020-06-01' and `sort_datetime` <= '2020-06-29' group by `scan_id`,day_of_month`,`hour_of_day` 2018 rows in set, 1 warning (29.13 sec) * 2 个的条件不一样? partition 都是 `create_datetime` >'2020-06-01' and `create_datetime` <= '2020-06-29' 分表 `create_datetime` >'2020-06-01' and `sort_datetime` <= '2020-06-29' * 还有,你是 from `parcel_with_partition` partition(`JUN`) 这样指定其中一张分区表的?能否 from parcel_with_partition_JUN 这样来对比? |
5 mahone3297 2020-06-18 00:19:28 +08:00 分区表的用法,应该是你 where 条件中,有了分区条件,使得,你的 sql 可以直接查询某几张表,而不是所有表。 |
6 flashrick 2020-06-18 09:40:30 +08:00 依我看,你分区的查询用错了啊,你用 month_of_year 分区的,查询的时候就是一条普通的 select 就行了,只是 where 条件要用到 month_of_year 。说错了请轻喷 |
7 Pendragon OP @flashrick @mahone3297 你说的没错 在 where 条件中指定 month_of_year 的值也能命中相应的分区,我在这里用 partition 强制限制分区是为了减少其他条件对查询的影响, 最终结果都是一样的 |
8 Still4 2020-06-18 10:00:05 +08:00 看了下分区表实现原理,每个分区一个文件,这样的话两个对照表文件大小一致,如果没有加索引,都是扫全表,执行时间不应该差很多啊 再复制一个 parcel_with_no_partition_m06 表,确认下机械盘内道外道的影响? |
9 someonedeng 2020-06-24 17:13:16 +08:00 via Android 话说本来是分表的,为什么又合成一个大表再分区而不是直接对小表分区? |
10 encro 2020-07-01 17:20:42 +08:00 |