国际版 Azure VM NC6 上运行 neural-style 遇到的 2 个疑问,主要是 Total memory 部分 #TensorFlow# #Azure VM NC6# - V2EX
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
taurenshaman
V2EX    TensorFlow

国际版 Azure VM NC6 上运行 neural-style 遇到的 2 个疑问,主要是 Total memory 部分 #TensorFlow# #Azure VM NC6#

  •  
  •   taurenshaman 2017-07-03 09:59:17 +08:00 4240 次点击
    这是一个创建于 3102 天前的主题,其中的信息可能已经有所发展或是发生改变。
    最近把玩了一下 https://github.com/anishathalye/neural-style
    参考了: http://blog.csdn.net/v_july_v/article/details/52658965

    为了跳过编译安装的坑(主要还是本地机器差。。。。。),我采用的是国际版 Azure 的虚拟机 NC6。代码和模型通过 Azure 存储的文件分享实现预先上传,然后将 SMB 共享装载到本地。简单提一下 NC 系列:
    NC 系列:NVIDIA k80 GPU。双 GPU,4992 个 CUDA 核心,24GB 显存,双精度 2.91TFLOPS,单精度 8.73TFLOPS。
    NC6:6 核+56GiB 内存+340GiB 硬盘+1X K80。$0.9/小时。

    最后是运行,微软的套件果然很符合傻瓜相机的思路,cd 到 /mnt/mosp 目录后,就可以直接运行:
    python neural_style.py --content ./source/WP_20170128_09_12_22_Rich.jpg --styles ./starry-sky.jpg --output ./result/WP_20170128_09_12_22_Rich.jpg

    在说我遇到的问题之前,罗列日志提示如下:

    I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
    I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
    I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
    I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
    I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
    W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
    W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
    W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
    W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
    W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
    W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
    I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
    name: Tesla K80
    major: 3 minor: 7 memoryClockRate (GHz) 0.8235
    pciBusID 9909:00:00.0
    Total memory: 11.17GiB
    Free memory: 11.11GiB
    I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
    I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y


    问题 1:上面有六个 warning,涉及 SSE3、SSE4.1、SSE4.2、AVX、AVX2、FMA 是关于 CPU 计算的,在 GPU 为主的情况下,需要启用它们吗?如果需要,那么我需要重新编译一个支持它们的 tensorflow 版本?

    问题 2 (重点):Total memory 只有 11GiB 多一点,而 NC6 的内存 56GiB,显存 24GiB,都不符合 Total memory 的大小啊。我当时用 top 命令查看了可用内存,有 55GiB 左右(隔了两天了,记不太清,但是 50+是肯定的)。我需要怎么改动配置信息吗?还是直接在 neural-style 的 Python 代码中通过 cOnfig= tf.ConfigProto()改变 GPU 内存分配方式?


    最后说下我测试时的场景:
    图片 1:300x369,迭代的时候(默认 1000 次迭代),几乎不到一秒一次。
    图片 2:
    手机上的照片,分辨率 2960x5258,第一次迭代,OutOfMemory
    缩小一半。1480x2629,第一次迭代,OutOfMemory
    再缩小,740x1315,可以迭代了,不到三秒一次迭代 -_-
    不做改变的情况下,貌似也就 1024x768 或者 1280x800 这个范围了。不过只用了 11GiB,明显太浪费了,而且无法处理高分辨率图片。
    谢谢!

    ***
    PS。知乎上也提问了,也可以在上面回答: https://www.zhihu.com/question/61931733

    PS。附上我对阿里云和 Azure 的价格对比,只针对适合计算的部分:
    https://my.worktile.com/share/tasks/9f5b1ca2560c45dc9bd46e2cb7b4b379
    密码:1234

    O(∩_∩)O 谢谢
    第 1 条附言    2017-07-05 11:27:23 +08:00
    1 条回复    2017-07-03 10:05:39 +08:00
    taurenshaman
        1
    taurenshaman  
    OP
       2017-07-03 10:05:39 +08:00
    附上傻瓜版流程:
    0、基本思路是便宜好用,用完就能删掉机器,但是保留代码和模型;另外每次重新建立,都不应该重新下载模型(几百 MB,太大了),浪费资源
    1、在 Azure 存储的文件分享上上传代码和模型
    2、Ubuntu 上创建一个装入点目录:
    sudo mkdir -p /mnt/mosp
    3、将 SMB 共享装载到本地目录( Azure 上都有文档,很简单):
    sudo mount -t cifs //****.file.core.windows.net/neural-style-van-gogh /mnt/mosp -o vers=3.0,username=****,password=****,dir_mode=0777,file_mode=0777
    4、cd 到 /mnt/mosp 目录
    5、运行:
    python neural_style.py --content ./source/WP_20170128_09_12_22_Rich.jpg --styles ./starry-sky.jpg --output ./result/WP_20170128_09_12_22_Rich.jpg
    关于     帮助文档     自助推广系统     博客     API     FAQ     Solana     5664 人在线   最高记录 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 33ms UTC 02:54 PVG 10:54 LAX 18:54 JFK 21:54
    Do have faith in what you're doing.
    ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86