为什么数据库分了表后性能会直线下降? - V2EX
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
MySQL 5.5 Community Server
MySQL 5.6 Community Server
Percona Configuration Wizard
XtraBackup 搭建主从复制
Great Sites on MySQL
Percona
MySQL Performance Blog
Severalnines
推荐管理工具
Sequel Pro
phpMyAdmin
推荐书目
MySQL Cookbook
MySQL 相关项目
MariaDB
Drizzle
参考文档
http://mysql-python.sourceforge.net/MySQLdb.html
Pendragon
V2EX    MySQL

为什么数据库分了表后性能会直线下降?

  •  2
     
  •   Pendragon 2020-06-17 23:10:58 +08:00 3619 次点击
    这是一个创建于 2017 天前的主题,其中的信息可能已经有所发展或是发生改变。

    原先有按月分的 12 张表,每张表里大概有 2000w 左右的数据

    ... CREATE TABLE `parcel_with_no_partition_m05` ( ... ) CREATE TABLE `parcel_with_no_partition_m06` ( `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '数据主键,毫无意义', `equip_id` int(11) DEFAULT NULL, `scan_id` int(11) DEFAULT NULL, `create_datetime` datetime DEFAULT NULL, `minute_of_hour` int(11) DEFAULT NULL, `hour_of_day` int(11) DEFAULT NULL, `day_of_month` int(11) DEFAULT NULL, `week_of_year` int(11) DEFAULT NULL, `month_of_year` int(11) DEFAULT NULL, `parcel_length` int(11) DEFAULT NULL PRIMARY KEY (`id`) ) ENGINE=innodb AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; CREATE TABLE `parcel_with_no_partition_m07` ( ... ) ... 

    现在想尝试用 mysql8 的 partition 来进行分表,集中存储了 12 个月的一共 2 亿多条数据,每个分区 2000w 条左右的数据,与之前的表数据一致

    CREATE TABLE `parcel_with_partition` ( `equip_id` int(11) DEFAULT NULL, `scan_id` int(11) DEFAULT NULL, `create_datetime` datetime DEFAULT NULL, `minute_of_hour` int(11) DEFAULT NULL, `hour_of_day` int(11) DEFAULT NULL, `day_of_month` int(11) DEFAULT NULL, `week_of_year` int(11) DEFAULT NULL, `month_of_year` int(11) DEFAULT NULL, `parcel_length` int(11) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ENGINE=InnoDB PARTITION BY RANGE(`month_of_year`) ( PARTITION `JAN` VALUES LESS THAN (2), PARTITION `FEB` VALUES LESS THAN (3), PARTITION `MAR` VALUES LESS THAN (4), PARTITION `APR` VALUES LESS THAN (5), PARTITION `MAY` VALUES LESS THAN (6), PARTITION `JUN` VALUES LESS THAN (7), PARTITION `JUL` VALUES LESS THAN (8), PARTITION `AUG` VALUES LESS THAN (9), PARTITION `SEP` VALUES LESS THAN (10), PARTITION `OCT` VALUES LESS THAN (11), PARTITION `NOV` VALUES LESS THAN (12), PARTITION `DEC` VALUES LESS THAN (13) ); 

    我尝试执行了一条业务 sql,发现效率下降了一倍

    #从 mysql 自带的分区中取数据 select sql_no_cache count(*) as num,`day_of_month` as day,`hour_of_day` as hour,`scan_id` as scanId from `parcel_with_partition` partition(`JUN`) where `equip_id` = 3 and `create_datetime` >'2020-06-01' and `create_datetime` <= '2020-06-29' group by `scan_id`,`day_of_month`,`hour_of_day` 2018 rows in set (59.56 sec) #从原始分表中取数据 select sql_no_cache count(*) as num,`day_of_month` as day,`hour_of_day` as hour,`scan_id` as scanId from `parcel_with_no_partition_m06` where `equip_id` = 3 and `create_datetime` >'2020-06-01' and `sort_datetime` <= '2020-06-29' group by `scan_id`,`day_of_month`,`hour_of_day` 2018 rows in set, 1 warning (29.13 sec) 

    又进一步用 explain 和 performance schema 分别观察了两种 sql

    mysql 自带分区查询详情

    explain: +----+-------------+-----------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+ | 1 | SIMPLE | parcel_with_partition | JUN | ALL | NULL | NULL | NULL | NULL | 20481855 | 1.11 | Using where; Using temporary | +----+-------------+-----------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+ table i/o: select * from table_io_waits_summary_by_table where count_star>0 \G; *************************** 1. row *************************** OBJECT_TYPE: TABLE OBJECT_SCHEMA: test OBJECT_NAME: parcel_with_partition COUNT_STAR: 22123040 SUM_TIMER_WAIT: 55709022384855 MIN_TIMER_WAIT: 55709022384855 AVG_TIMER_WAIT: 2517879 MAX_TIMER_WAIT: 55709022384855 COUNT_READ: 22123040 SUM_TIMER_READ: 55709022384855 MIN_TIMER_READ: 55709022384855 AVG_TIMER_READ: 2517879 MAX_TIMER_READ: 55709022384855 COUNT_WRITE: 0 SUM_TIMER_WRITE: 0 MIN_TIMER_WRITE: 0 AVG_TIMER_WRITE: 0 MAX_TIMER_WRITE: 0 COUNT_FETCH: 22123040 SUM_TIMER_FETCH: 55709022384855 MIN_TIMER_FETCH: 55709022384855 AVG_TIMER_FETCH: 2517879 MAX_TIMER_FETCH: 55709022384855 COUNT_INSERT: 0 SUM_TIMER_INSERT: 0 MIN_TIMER_INSERT: 0 AVG_TIMER_INSERT: 0 MAX_TIMER_INSERT: 0 COUNT_UPDATE: 0 SUM_TIMER_UPDATE: 0 MIN_TIMER_UPDATE: 0 AVG_TIMER_UPDATE: 0 MAX_TIMER_UPDATE: 0 COUNT_DELETE: 0 SUM_TIMER_DELETE: 0 MIN_TIMER_DELETE: 0 AVG_TIMER_DELETE: 0 MAX_TIMER_DELETE: 0 file i/o: FILE_NAME: /home/wedo/mysql/data/mysql/test/parcel_with_partition#p#jun.ibd EVENT_NAME: wait/io/file/innodb/innodb_data_file OBJECT_INSTANCE_BEGIN: 140081009323904 COUNT_STAR: 422002 SUM_TIMER_WAIT: 3965120084812 MIN_TIMER_WAIT: 5739112 AVG_TIMER_WAIT: 9395831 MAX_TIMER_WAIT: 498159493 COUNT_READ: 422002 SUM_TIMER_READ: 3965120084812 MIN_TIMER_READ: 5739112 AVG_TIMER_READ: 9395831 MAX_TIMER_READ: 498159493 SUM_NUMBER_OF_BYTES_READ: 6914080768 COUNT_WRITE: 0 SUM_TIMER_WRITE: 0 MIN_TIMER_WRITE: 0 AVG_TIMER_WRITE: 0 MAX_TIMER_WRITE: 0 SUM_NUMBER_OF_BYTES_WRITE: 0 COUNT_MISC: 0 SUM_TIMER_MISC: 0 MIN_TIMER_MISC: 0 AVG_TIMER_MISC: 0 MAX_TIMER_MISC: 0 

    原始分表查询详情

    explain: +----+-------------+-----------------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+ | 1 | SIMPLE | parcel_with_no_partition_m06 | NULL | ALL | NULL | NULL | NULL | NULL | 22123040 | 1.11 | Using where; Using temporary | +----+-------------+------------------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+ table i/o: OBJECT_TYPE: TABLE OBJECT_SCHEMA: test OBJECT_NAME: parcel_with_no_partition_m06 COUNT_STAR: 22123040 SUM_TIMER_WAIT: 23182194259854 MIN_TIMER_WAIT: 23182194259854 AVG_TIMER_WAIT: 1047813 MAX_TIMER_WAIT: 23182194259854 COUNT_READ: 22123040 SUM_TIMER_READ: 23182194259854 MIN_TIMER_READ: 23182194259854 AVG_TIMER_READ: 1047813 MAX_TIMER_READ: 23182194259854 COUNT_WRITE: 0 SUM_TIMER_WRITE: 0 MIN_TIMER_WRITE: 0 AVG_TIMER_WRITE: 0 MAX_TIMER_WRITE: 0 COUNT_FETCH: 22123040 SUM_TIMER_FETCH: 23182194259854 MIN_TIMER_FETCH: 23182194259854 AVG_TIMER_FETCH: 1047813 MAX_TIMER_FETCH: 23182194259854 COUNT_INSERT: 0 SUM_TIMER_INSERT: 0 MIN_TIMER_INSERT: 0 AVG_TIMER_INSERT: 0 MAX_TIMER_INSERT: 0 COUNT_UPDATE: 0 SUM_TIMER_UPDATE: 0 MIN_TIMER_UPDATE: 0 AVG_TIMER_UPDATE: 0 MAX_TIMER_UPDATE: 0 COUNT_DELETE: 0 SUM_TIMER_DELETE: 0 MIN_TIMER_DELETE: 0 AVG_TIMER_DELETE: 0 MAX_TIMER_DELETE: 0 file i/o: FILE_NAME: /home/wedo/mysql/data/mysql/test/parcel_with_no_partition_m06.ibd EVENT_NAME: wait/io/file/innodb/innodb_data_file OBJECT_INSTANCE_BEGIN: 140081009372480 COUNT_STAR: 424687 SUM_TIMER_WAIT: 4045883911263 MIN_TIMER_WAIT: 5301220 AVG_TIMER_WAIT: 9526557 MAX_TIMER_WAIT: 861143891 COUNT_READ: 424687 SUM_TIMER_READ: 4045883911263 MIN_TIMER_READ: 5301220 AVG_TIMER_READ: 9526557 MAX_TIMER_READ: 861143891 SUM_NUMBER_OF_BYTES_READ: 6958071808 COUNT_WRITE: 0 SUM_TIMER_WRITE: 0 MIN_TIMER_WRITE: 0 AVG_TIMER_WRITE: 0 MAX_TIMER_WRITE: 0 SUM_NUMBER_OF_BYTES_WRITE: 0 COUNT_MISC: 0 SUM_TIMER_MISC: 0 MIN_TIMER_MISC: 0 AVG_TIMER_MISC: 0 MAX_TIMER_MISC: 0 

    在 explain 和 file i/o 中都没有看出来异常情况,但是在 table i/o 时 内部分区 使用的时间远高于 原始的分表

    这是不是说明多浪费的时间的罪魁祸首是逻辑 i/o 部分?为什么分表后会出现这样的结果呢?希望大神们多多指教

    补充一下环境: 系统:centos7 存储引擎:mysql8 服务器配置:12 核 2.5Ghz 64G 内存 硬盘:机械硬盘 在执行查询 sql 时最大的内存占用 3G

    10 条回复    2020-07-01 17:20:42 +08:00
    xuanbg
        1
    xuanbg  
       2020-06-17 23:36:50 +08:00
    2000 万和 2.4 亿能一样? PARTITION BY RANGE 实际上还是 1 张表,只不过数据分开存罢了。
    ajsonx
        2
    ajsonx  
       2020-06-18 00:02:23 +08:00
    8 的窗口函数我试过几个,主要是为了编写方便吧。试过 1 千万的数据,性能上都不如写联表查询。
    Pendragon
        3
    Pendragon  
    OP
       2020-06-18 00:09:11 +08:00
    @xuanbg 我最开始也是这么想的,但是分析结果显示两种查询方法的硬盘 i/o 时间是一样的,占用的最大内存也是单张分表 /分区的大小,所以分区方式应该并没有把 2.4 亿的数据全部取出来
    mahone3297
        4
    mahone3297  
       2020-06-18 00:18:12 +08:00
    #从 mysql 自带的分区中取数据
    select sql_no_cache count(*) as num,`day_of_month` as day,`hour_of_day` as hour,`scan_id` as scanId from `parcel_with_partition` partition(`JUN`) where `equip_id` = 3 and `create_datetime` >'2020-06-01' and `create_datetime` <= '2020-06-29' group by `scan_id`,`day_of_month`,`hour_of_day`
    2018 rows in set (59.56 sec)

    #从原始分表中取数据
    select sql_no_cache count(*) as num,`day_of_month` as day,`hour_of_day` as hour,`scan_id` as scanId from `parcel_with_no_partition_m06` where `equip_id` = 3 and `create_datetime` >'2020-06-01' and `sort_datetime` <= '2020-06-29' group by `scan_id`,day_of_month`,`hour_of_day`
    2018 rows in set, 1 warning (29.13 sec)


    * 2 个的条件不一样?
    partition 都是 `create_datetime` >'2020-06-01' and `create_datetime` <= '2020-06-29'
    分表 `create_datetime` >'2020-06-01' and `sort_datetime` <= '2020-06-29'
    * 还有,你是 from `parcel_with_partition` partition(`JUN`) 这样指定其中一张分区表的?能否 from parcel_with_partition_JUN 这样来对比?
    mahone3297
        5
    mahone3297  
       2020-06-18 00:19:28 +08:00
    分区表的用法,应该是你 where 条件中,有了分区条件,使得,你的 sql 可以直接查询某几张表,而不是所有表。
    flashrick
        6
    flashrick  
       2020-06-18 09:40:30 +08:00
    依我看,你分区的查询用错了啊,你用 month_of_year 分区的,查询的时候就是一条普通的 select 就行了,只是 where 条件要用到 month_of_year 。说错了请轻喷
    Pendragon
        7
    Pendragon  
    OP
       2020-06-18 09:54:05 +08:00
    @flashrick
    @mahone3297
    你说的没错 在 where 条件中指定 month_of_year 的值也能命中相应的分区,我在这里用 partition 强制限制分区是为了减少其他条件对查询的影响, 最终结果都是一样的
    Still4
        8
    Still4  
       2020-06-18 10:00:05 +08:00
    看了下分区表实现原理,每个分区一个文件,这样的话两个对照表文件大小一致,如果没有加索引,都是扫全表,执行时间不应该差很多啊

    再复制一个 parcel_with_no_partition_m06 表,确认下机械盘内道外道的影响?
    someonedeng
        9
    someonedeng  
       2020-06-24 17:13:16 +08:00 via Android
    话说本来是分表的,为什么又合成一个大表再分区而不是直接对小表分区?
    encro
        10
    encro  
       2020-07-01 17:20:42 +08:00
    既然你都是根据 create_datetime 查询的,那么就根据 create_datetime 分区即可

    http://mysql.taobao.org/monthly/2017/11/09/
    关于     帮助文档     自助推广系统     博客     API     FAQ     Solana     5433 人在线   最高记录 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 30ms UTC 02:36 PVG 10:36 LAX 18:36 JFK 21:36
    Do have faith in what you're doing.
    ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86