pg 的前辈们进来帮看看索引的问题 - V2EX
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
kyf0722
V2EX    PostgreSQL

pg 的前辈们进来帮看看索引的问题

  •  
  •   kyf0722 2019-03-23 21:32:28 +08:00 4608 次点击
    这是一个创建于 2444 天前的主题,其中的信息可能已经有所发展或是发生改变。

    我现在有一个数据库, 表 notice 大概 100W row 左右的数据

    表结构如下:

    CREATE TABLE public.notice ( id integer NOT NULL DEFAULT nextval('notice_id_seq'::regclass), title character varying(200) COLLATE pg_catalog."default" NOT NULL, ags integer[] DEFAULT '{}'::integer[], public_time timestamp without time zone, content text COLLATE pg_catalog."default", type integer NOT NULL, site character(20) COLLATE pg_catalog."default", url character varying(1024) COLLATE pg_catalog."default", area character(10) COLLATE pg_catalog."default", CONSTRAINT notice_pkey PRIMARY KEY (id) ) WITH ( OIDS = FALSE ) TABLESPACE pg_default; 

    索引如下:

    CREATE INDEX area_type_time_desc ON public.notice USING btree (area COLLATE pg_catalog."default" bpchar_pattern_ops, type, public_time DESC) TABLESPACE pg_default; 

    如下查询语句:

    explain analyze select id, public_time, title, area from notice where area ~ '^110' and type=31 limit 10 

    输出:

    "Limit (cost=0.43..39.49 rows=10 width=122) (actual time=0.930..0.977 rows=10 loops=1)" " -> Index Scan using area_type_time_desc on notice (cost=0.43..67845.25 rows=17369 width=122) (actual time=0.928..0.971 rows=10 loops=1)" " Index Cond: ((area ~>=~ '110'::bpchar) AND (area ~<~ '111'::bpchar) AND (type = 31))" " Filter: (area ~ '^110'::text)" "Planning Time: 0.532 ms" "Execution Time: 1.020 ms" 

    可以看出索引是生效的!

    当加上排序:

    explain analyze select id, public_time, title, area from notice where area ~ '^110' and type=31 order by public_time desc limit 10 

    输出:

    "Limit (cost=51918.94..51918.97 rows=10 width=122) (actual time=214.522..214.530 rows=10 loops=1)" " -> Sort (cost=51918.94..51962.36 rows=17369 width=122) (actual time=214.520..214.522 rows=10 loops=1)" " Sort Key: public_time" " Sort Method: top-N heapsort Memory: 29kB" " -> Bitmap Heap Scan on notice (cost=2012.57..51543.60 rows=17369 width=122) (actual time=30.013..205.214 rows=18635 loops=1)" " Recheck Cond: (type = 31)" " Filter: (area ~ '^110'::text)" " Heap Blocks: exact=15625" " -> Bitmap Index Scan on area_type_time_desc (cost=0.00..2008.23 rows=17362 width=0) (actual time=23.107..23.107 rows=18635 loops=1)" " Index Cond: ((area ~>=~ '110'::bpchar) AND (area ~<~ '111'::bpchar) AND (type = 31))" "Planning Time: 0.636 ms" "Execution Time: 214.591 ms" 

    排序耗时非常长, 没有用到索引排序

    我尝试只建立一个 public_time desc 的索引, 查询不加任何条件只有一个 order by 同样排序要很耗时

    请前辈们指正一下我哪里用的用问题, 我查了好多文档, 没有找到解决方案, 特来求助

    9 条回复    2019-06-19 16:36:31 +08:00
    zbinlin
        1
    zbinlin  
       2019-03-23 23:05:10 +08:00
    调整下 work_mem 或 maintenance_work_mem,重新建索引试试。
    xiangyuecn
        2
    xiangyuecn  
       2019-03-23 23:38:37 +08:00
    一楼正解,调大排序内存,虽然不懂 pg。Sort Method: top-N heapsort Memory: 29kB,这句翻译过来不知道是不是字面意思,1.8 万数据这么点内存去排序,200 毫秒算是蛮快了。
    reus
        3
    reus  
       2019-03-24 00:06:22 +08:00
    work_mem 太小可能是原因,但 work_mem 小并没有导致排序慢,而是触发了 recheck cond,所以才慢

    Bitmap Index Scan 这一步,如果 work_mem 太小,它不会返回 row id 而是 page id,这样就需要 Bitmap Heap Scan 里 recheck 这一步,因为一个 page 可能有不符合条件的 row。

    排序并不慢,Bitmap Heap Scan 实际跑了 205.214 毫秒,加上排序是 214.520 ,占比不高。
    kyf0722
        4
    kyf0722  
    OP
       2019-03-24 00:13:07 +08:00
    @zbinlin

    pg 默认的 work_mem 是 4M maintenance_work_mem 是 64M
    我修改当前 session 的 work_mem 到 128M, maintenance_work_mem 到 512M
    然后 drop index, 再重新 create index

    同样上面带 order by 的查询语句执行结果
    ```
    "Limit (cost=51977.56..51977.58 rows=10 width=122) (actual time=126.678..126.685 rows=10 loops=1)"
    " -> Sort (cost=51977.56..52021.04 rows=17394 width=122) (actual time=126.676..126.678 rows=10 loops=1)"
    " Sort Key: public_time DESC"
    " Sort Method: top-N heapsort Memory: 29kB"
    " -> Bitmap Heap Scan on notice (cost=2017.69..51601.68 rows=17394 width=122) (actual time=35.036..117.11 rows=18635 loops=1)"
    " Recheck Cond: (type = 31)"
    " Filter: (area ~ '^110'::text)"
    " Heap Blocks: exact=15625"
    " -> Bitmap Index Scan on area_type_public_time_desc (cost=0.00..2013.34 rows=17386 width=0) (actual time=28.123..28.123 rows=18635 loops=1)"
    " Index Cond: ((area ~>=~ '110'::bpchar) AND (area ~<~ '111'::bpchar) AND (type = 31))"
    "Planning Time: 0.615 ms"
    "Execution Time: 126.819 ms"
    ```

    比刚刚确实快一些, 但是排序似乎还是和索引没有关系, 如果我把 order by 的 public_time 替换为 id, 可以在 2-5ms 内查出数据

    @xiangyuecn Sort Method: top-N heapsort Memory: 29kB 中的 29kB 应该是排序使用了 29K 的内存, 我试过如果数据量巨大, 内存不够的时候会显示 disk size, 也就是用到了磁盘
    kyf0722
        5
    kyf0722  
    OP
       2019-03-24 00:21:58 +08:00
    @reus 这位仁兄说的很有道理

    我索性直接把 work_mem 加到了 512M, 但是执行结果更加匪夷所思
    "Limit (cost=118704.89..118704.92 rows=10 width=134) (actual time=688.049..688.057 rows=10 loops=1)"
    " -> Sort (cost=118704.89..118816.86 rows=44786 width=134) (actual time=688.047..688.049 rows=10 loops=1)"
    " Sort Key: public_time DESC"
    " Sort Method: top-N heapsort Memory: 27kB"
    " -> Bitmap Heap Scan on notice (cost=1869.15..117737.08 rows=44786 width=134) (actual time=47.015..663.260 rows=44366 loops=1)"
    " Filter: ((area ~ '^110'::text) AND (type <> 31))"
    " Rows Removed by Filter: 18635"
    " Heap Blocks: exact=39610"
    " -> Bitmap Index Scan on area_type_public_time_desc (cost=0.00..1857.96 rows=62153 width=0) (actual time=28.356..28.356 rows=63001 loops=1)"
    " Index Cond: ((area ~>=~ '110'::bpchar) AND (area ~<~ '111'::bpchar))"
    "Planning Time: 0.578 ms"
    "Execution Time: 688.124 ms"

    无法理解了!!, 多次执行都差不多
    kyf0722
        6
    kyf0722  
    OP
       2019-03-24 00:25:52 +08:00
    @reus 抱歉, 错了, 请忽略上条回复. work_mem 加到 512M 之后, 的执行结果

    "Limit (cost=51977.56..51977.58 rows=10 width=134) (actual time=112.037..112.044 rows=10 loops=1)"
    " -> Sort (cost=51977.56..52021.04 rows=17394 width=134) (actual time=112.034..112.037 rows=10 loops=1)"
    " Sort Key: public_time DESC"
    " Sort Method: top-N heapsort Memory: 29kB"
    " -> Bitmap Heap Scan on notice (cost=2017.69..51601.68 rows=17394 width=134) (actual time=30.136..103.225 rows=18635 loops=1)"
    " Recheck Cond: (type = 31)"
    " Filter: (area ~ '^110'::text)"
    " Heap Blocks: exact=15625"
    " -> Bitmap Index Scan on area_type_public_time_desc (cost=0.00..2013.34 rows=17386 width=0) (actual time=22.173..22.173 rows=18635 loops=1)"
    " Index Cond: ((area ~>=~ '110'::bpchar) AND (area ~<~ '111'::bpchar) AND (type = 31))"
    "Planning Time: 0.571 ms"
    "Execution Time: 112.108 ms"

    还是触发了 Recheck
    reus
        7
    reus  
       2019-03-24 11:47:14 +08:00
    如果存储是 SSD,用 SET random_page_cost = 1; 看有没有改善。bitmap index scan 的目的是减少不必要的 page read,因为贵。但如果随机读 page 的成本低,planner 会倾向于直接读而不是用 bitmap index scan。

    还有个可能的原因是物理存储位置太分散,也会用 bitmap index scan,试下 vacuum full notice
    encro
        8
    encro  
       2019-06-19 16:33:58 +08:00
    这个组合索引顺序有点问题,先 type,再 area,在这个查询里,area 索引应该失效了。
    encro
        9
    encro  
       2019-06-19 16:36:31 +08:00
    area 因为是模糊查询,导致 public_time 也失效了。
    关于     帮助文档     自助推广系统     博客     API     FAQ     Solana     5370 人在线   最高记录 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 24ms UTC 01:36 PVG 09:36 LAX 17:36 JFK 20:36
    Do have faith in what you're doing.
    ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86