mongodb 频繁异常退出 errno:24 Too many open files 求助 - V2EX
comwrg

mongodb 频繁异常退出 errno:24 Too many open files 求助

  •  
  •   comwrg Aug 2, 2019 15828 views
    This topic created in 2475 days ago, the information mentioned may be changed or developed.

    部分日志

    2019-08-01T23:59:02.301+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:02.302+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:03.302+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:03.302+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:04.302+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:04.302+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:05.302+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:05.302+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:06.302+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:06.302+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:07.302+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:07.302+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:08.302+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:08.303+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:09.303+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:09.303+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:10.303+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:10.303+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:11.303+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:11.303+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:12.303+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:12.303+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:13.303+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:13.303+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:14.303+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:14.304+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:15.304+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:15.304+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:16.304+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:16.304+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:17.304+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:17.304+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:18.304+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:18.304+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:19.295+0800 W NETWORK [HostnameCanonicalizationWorker] Failed to obtain address information for hostname iZuf61zao4uxbprumx45dlZ: System error 2019-08-01T23:59:19.304+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:19.304+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:20.304+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:20.305+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:21.305+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:21.305+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:22.305+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:22.305+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:23.305+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2019-08-01T23:59:23.305+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2019-08-01T23:59:23.631+0800 E STORAGE [thread2] WiredTiger (24) [1564675163:631372][9783:0x7f4e30730700], file:WiredTiger.wt, WT_SESSION.checkpoint: /var/lib/mongodb/WiredTiger.turtle: handle-open: open: Too many open files 2019-08-01T23:59:23.632+0800 E STORAGE [thread2] WiredTiger (24) [1564675163:632761][9783:0x7f4e30730700], checkpoint-server: checkpoint server error: Too many open files 2019-08-01T23:59:23.632+0800 E STORAGE [thread2] WiredTiger (-31804) [1564675163:632802][9783:0x7f4e30730700], checkpoint-server: the process must exit and restart: WT_PANIC: WiredTiger library panic 2019-08-01T23:59:23.632+0800 I - [thread2] Fatal Assertion 28558 2019-08-01T23:59:23.632+0800 I - [thread2] ***aborting after fassert() failure 2019-08-01T23:59:23.638+0800 F - [thread2] Got signal: 6 (Aborted). 
    ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 31862 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 65535 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 31862 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited 

    设置了 sysctl.conf fs.file-max = 2097152

    每天都会崩溃 实在不清楚问题所在根源

    Supplement 1    Aug 2, 2019
    mongod --version db version v3.2.11 git version: 009580ad490190ba33d1c6253ebd8d91808923e4 OpenSSL version: OpenSSL 1.0.2s 28 May 2019 allocator: tcmalloc modules: none build environment: distarch: x86_64 target_arch: x86_64 
    22 replies    2020-06-05 13:56:02 +08:00
    KYLINZZ
        1
    KYLINZZ  
       Aug 2, 2019   1
    auser
        2
    auser  
       Aug 2, 2019   1
    建议在 /proc/PID/limits 文件里看进程到底能打开多少 FD
    comwrg
        3
    comwrg  
    OP
       Aug 2, 2019
    @auser


    ```
    Limit Soft Limit Hard Limit Units
    Max cpu time unlimited unlimited seconds
    Max file size unlimited unlimited bytes
    Max data size unlimited unlimited bytes
    Max stack size 8388608 unlimited bytes
    Max core file size 0 unlimited bytes
    Max resident set unlimited unlimited bytes
    Max processes 64000 64000 processes
    Max open files 64000 64000 files
    Max locked memory unlimited unlimited bytes
    Max address space unlimited unlimited bytes
    Max file locks unlimited unlimited locks
    Max pending signals 31862 31862 signals
    Max msgqueue size 819200 819200 bytes
    Max nice priority 0 0
    Max realtime priority 0 0
    Max realtime timeout unlimited unlimited us
    ```

    看了下应该是没有问题的
    auser
        4
    auser  
       Aug 2, 2019   1
    @comwrg 检查下 TCP 连接的数量,可以使用 ss 或者 netstat,然后看看 mongodb 进程相关的连接数量是否过多。如果过多,要根据 TCP 所处的状态来进一步推断问题在哪里,到底是什么原因把文件描述符资源占用完了。比如说被拒绝服务攻击,大量空的 TCP 连接。

    一个网络连接占用一个文件描述符( fd ),打开文件读写也占用一个。从错误日志来看,最先出现的错误是文件描述符用完,导致新的网络连接拿不到 fd,accept (接受新网络连接的系统调用)失败。这种情况还好。但是对数据库而言,文件写不进磁盘,数据无法落地,主动崩溃是好的做法。

    针对楼主的问题,我觉得很可能是频繁调用的地方,文件使用完没有关闭,导致 fd 一直无法释放,最终达到上限。现在楼主应该从网络(第一段所说)与 /proc/PID/fd/目录下来排查故障原因。
    est
        5
    est  
       Aug 2, 2019
    inode 用完了。
    comwrg
        6
    comwrg  
    OP
       Aug 2, 2019
    comwrg
        7
    comwrg  
    OP
       Aug 2, 2019
    @KYLINZZ
    我看里面的 version 是 2.6.7 与我的对不上呀 这个 BUG 也有点老老
    MilkShake
        8
    MilkShake  
       Aug 2, 2019
    这种一般都是磁盘没空间了,要不就是 i 节点用完了。
    julyclyde
        9
    julyclyde  
       Aug 2, 2019   1
    用 ulimit 或者 /etc/securiyt/limits.conf 去查看和修改是一种很经典的错误

    后台服务的 rlimit 要在其启动的地方设置
    bigpigB
        10
    bigpigB  
       Aug 2, 2019 via Android
    ulimit 改大一点
    neverfall
        11
    neverfall  
       Aug 2, 2019
    只管开不管关么?
    记得 close
    comwrg
        12
    comwrg  
    OP
       Aug 2, 2019
    @est @aaa5838769 都没用哈
    comwrg
        13
    comwrg  
    OP
       Aug 2, 2019
    @est @aaa5838769 都没有哈
    comwrg
        14
    comwrg  
    OP
       Aug 2, 2019
    @auser 非常感谢,已经按照您说的去排查了

    排查到 mongodb 占用了很多 fd ( 24135/38839 )占用超过了一半往上

    ![image]( https://user-images.githubusercontent.com/19854253/62348661-efa26b00-b52f-11e9-80be-b1eef07c061b.png)

    难道真的时候项目中没有关闭连接吗 不过这个项目已经运行了好几个月了 只是最近几天 mongo 开始频繁的因为 fd 用完而崩溃
    comwrg
        15
    comwrg  
    OP
       Aug 2, 2019
    auser
        16
    auser  
       Aug 2, 2019 via iPhone
    docs.mongodb.com/v3.2/core/index-text/

    隐约感觉问题出在这里,推测是设计问题(滥用数据库)。我不会这个数据库,只能帮到这里了。
    comwrg
        17
    comwrg  
    OP
       Aug 2, 2019
    @auser 好的,非常感谢您提供的建议。我自己再去慢慢排查:)
    ilucio
        18
    ilucio  
       Aug 2, 2019 via Android
    将 ulimit 设置成 64000,官网文档里讲了的
    auser
        19
    auser  
       Aug 2, 2019 via iPhone
    @comwrg

    如果系统负载跟磁盘 io 不高
    先直接把文件描述符限制增大吧
    有最终结果了分享下吧
    主要是为什么会打开那么多索引文件
    comwrg
        20
    comwrg  
    OP
       Aug 3, 2019 via Android
    @auser 恩,已经设置到 200000 了
    qq1340691923
        21
    qq1340691923  
       Jun 5, 2020
    @comwrg 你倒是分享一下最终结果啊..
    comwrg
        22
    comwrg  
    OP
       Jun 5, 2020
    @qq1340691923
    缓解方案,将 ulimit 设置的非常大 之前设置到了 200000 就没有出现那种情况了
    暂时还是不清楚是什么原因,不过推断可能是 collection 过多的原因(数量级大于十万)
    About     Help     Advertise     Blog     API     FAQ     Solana     3393 Online   Highest 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 54ms UTC 12:16 PVG 20:16 LAX 05:16 JFK 08:16
    Do have faith in what you're doing.
    ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86