我用 tensorflow 实现的“一个神经聊天模型”,一个基于深度学习的聊天机器人 - V2EX
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
Sign Up Now
For Existing Member  Sign In
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
fendouai_com

我用 tensorflow 实现的“一个神经聊天模型”,一个基于深度学习的聊天机器人

  •  
  •   fendouai_com Sep 5, 2017 5813 views
    This topic created in 3158 days ago, the information mentioned may be changed or developed.

    标题的我是指 DeepQA 项目的作者。 接上篇: t/388328 ,翻译了部分项目说明文档。

    概述

    这个工作尝试重现这个论文的结果 A Neural Conversational Model (aka the Google chatbot). 它使用了循环神经网络( seq2seq 模型)来进行句子预测。它是用 python 和 TensorFlow 开发。

    程序的加载主体部分是参考 Torch 的 neuralconvo from macournoyer.

    现在, DeepQA 支持一下对话语料:

    To speedup the training, it's also possible to use pre-trained word embeddings (thanks to Eschnou). More info here.

    安装

    这个程序需要一下依赖(easy to install using pip: pip3 install -r requirements.txt):

    • python 3.5
    • tensorflow (tested with v1.0)
    • numpy
    • CUDA (for using GPU)
    • nltk (natural language toolkit for tokenized the sentences)
    • tqdm (for the nice progression bars)

    你可能需要下载附带的数据让 nltk 正常工作。

    python3 -m nltk.downloader punkt 

    Cornell 数据集已经包括了。其他的数据集查看 readme 文件到他们所在的文件夹。 (在 data/).

    网站接口需要一些附加的包:

    • django (tested with 1.10)
    • channels
    • Redis (see here)
    • asgi_redis (at least 1.0)

    Docker 安装也是支持的,更多详细的教程参考 here.

    运行

    聊天机器人

    训练这个模型,直接运行 main.py 。一旦训练完成,你可以测试结果用 main.py --test (结果生成在 'save/model/samples_predictions.txt') 或者用 main.py --test interactive (更有趣).

    Here are some flags which could be useful. For more help and options, use python main.py -h:

    • --modelTag <name>: allow to give a name to the current model to differentiate between them when testing/training.
    • --keepAll: use this flag when training if when testing, you want to see the predictions at different steps (it can be interesting to see the program changes its name and age as the training progress). Warning: It can quickly take a lot of storage space if you don't increase the --saveEvery option.
    • --filterVocab 20 or --vocabularySize 30000: Limit the vocabulary size to and optimize the performances and memory usage. Replace the words used less than 20 times by the <unknown> token and set a maximum vocabulary size.
    • --verbose: when testing, will print the sentences as they are computed.
    • --playDataset: show some dialogue samples from the dataset (can be use conjointly with --createDataset if this is the only action you want to perform).

    To visualize the computational graph and the cost with TensorBoard, just run tensorboard --logdir save/.

    默认的这个网络架构是一个标准的 encoder/decoder 有两个 LSTM layers (隐藏层大小 256) ,然后 vocabulary 的 embedding size 是 32.这个网络用 ADAM 训练。最大的句子长度设置为 10 个单词,但是可以增加。

    结果

    当然,这个网络并不会很擅长聊天:

    Q: Hi A: Hi. Q: What is your name ? A: Laura. Q: What does that mean ? A: I dunno. Q: How old are you ? A: thirty-five. Q: Will Google hire me ? A: No. Q: Tell me the alphabet A: Fuck you. Q: That's not nice A: Yeah. 

    这里有一些情况它并不能正确回答:

    Q: Who is Laura ? A: My brother. Q: Say 'goodbye' A: Alright. Q: What is cooking ? A: A channel. Q: Can you say no ? A: No. Q: Two plus two A: Manny... 

    预训练模型

    项目截图:

    chatbot_miniature.png

    实测截图:

    Screenshot from 2017-09-05 14-47-52.png

    一步一步教程:

    1.下载这个项目: https://github.com/Conchylicultor/DeepQA 2.下载训练好的模型: https://drive.google.com/file/d/0Bw-phsNSkq23OXRFTkNqN0JGUU0/view (如果网址不能打开的话,今晚我会上传到百度网盘,分享到: http://www.tensorflownews.com/) 3.解压之后放在 项目 save 目录下 如图所示

    Screenshot from 2017-09-05 14-52-13.png

    4.复制 save/model-pretrainedv2/dataset-cornell-old-lenght10-filter0-vocabSize0.pkl 这个文件到 data/samples/

    如图所示:

    Screenshot from 2017-09-05 14-55-00.png

    5.在项目目录执行一下命令:

    python3 main.py --modelTag pretrainedv2 --test interactive 

    程序读取了预训练的模型之后,如图:

    Screenshot from 2017-09-05 14-57-14.png

    聊天机器人资源合集

    项目,语聊,论文,教程 https://github.com/fendouai/Awesome-Chatbot

    更多教程:

    http://www.tensorflownews.com/

    DeepQA

    https://github.com/Conchylicultor/DeepQA

    备注:为了更加容易了解这个项目,说明部分翻译了项目的部分 readme,主要是介绍使用预处理数据来运行这个项目。

    11 replies    2017-09-05 22:13:40 +08:00
    peterontheway
        1
    peterontheway  
       Sep 5, 2017
    kinda of silly, but I'd like to try:)
    fendouai_com
        2
    fendouai_com  
    OP
       Sep 5, 2017
    @shihty5 just do it!
    oott123
        3
    oott123  
       Sep 5, 2017
    Q: Can you say no?
    A: No.

    这句笑死了
    peterontheway
        4
    peterontheway  
       Sep 5, 2017
    @oott123 悖论哈哈
    kyleyan
        5
    kyleyan  
       Sep 5, 2017
    神经聊天 [滑稽
    assad
        6
    assad  
       Sep 5, 2017
    哈哈,认识中文不?
    silencefent
        7
    silencefent  
       Sep 5, 2017
    俩神经病在聊天
    Actrace
        8
    Actrace  
       Sep 5, 2017
    @silencefent 没毛病啊。
    lunatic5
        9
    lunatic5  
       Sep 5, 2017
    会说中文不
    keysona
        10
    keysona  
       Sep 5, 2017
    楼主,不好意思。

    看成神经病聊天了...
    fendouai_com
        11
    fendouai_com  
    OP
       Sep 5, 2017
    @lunatic5 不会,没有中文数据集,稍后会发一个中文聊天机器人。
    About     Help     Advertise     Blog     API     FAQ     Solana     1011 Online   Highest 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 55ms UTC 22:27 PVG 06:27 LAX 15:27 JFK 18:27
    Do have faith in what you're doing.
    ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86