[伸手党] 有没有开源的中文分句项目, cpp or Python - V2EX
请不要在回答技术问题时复制粘贴 AI 生成的内容
a41050447

[伸手党] 有没有开源的中文分句项目, cpp or Python

  •  
  •   a41050447 Feb 1, 2019 via iPhone 5541 views
    This topic created in 2663 days ago, the information mentioned may be changed or developed.
    就是把篇章分成句子,可自定规则最好
    8 replies    2019-02-02 07:24:03 +08:00
    Sanko
        2
    Sanko  
       Feb 1, 2019 via Android
    jieba
    xemtof
        3
    xemtof  
       Feb 1, 2019
    @Sanko 结巴是分词的,没有分句吧。
    neptuno
        4
    neptuno  
       Feb 1, 2019
    分句?标点符号,换行符分一分?感觉这种东西自己写写比较好,重点是分词吧
    inhzus
        5
    inhzus  
       Feb 1, 2019   1
    正好可能以后用到, 稍微写了写

    使用的第三方库 [HanLP]( https://github.com/hankcs/HanLP)

    代码:
    https://gist.github.com/imagecser/ea03d286838fb9afe7e20fba46c4ecd2

    结果:


    如果非要用 python 的话, 参考一下 pyhanlp 就好了
    a41050447
        6
    a41050447  
    OP
       Feb 1, 2019 via iPhone
    @neptuno 主要是要考虑各种规则,括号,引号,双标点,小数点,url 这些,还可能是中英混合的文档,造轮子太废时,
    neptuno
        7
    neptuno  
       Feb 1, 2019
    @a41050447 恩,你可以试试先分词再分句试试,主要是你要自定规则,有些轮子以后坑更大
    yuikns
        8
    yuikns  
       Feb 2, 2019
    通常是做词法分析,然后得到完整的结构即可吧?

    两个推荐看看的包:

    http://thulac.thunlp.org/

    https://stanfordnlp.github.io/CoreNLP/
    About     Help     Advertise     Blog     API nbsp;   FAQ     Solana     2528 Online   Highest 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 44ms UTC 15:24 PVG 23:24 LAX 08:24 JFK 11:24
    Do have faith in what you're doing.
    ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86