V2EX nanshan2012
 nanshan2012's recent timeline updates
nanshan2012

nanshan2012

V2EX member #432578, joined on 2019-07-28 22:49:57 +08:00
Today's activity rank 12596
nanshan2012's recent replies
@diudiuu 不错不错,修正之后正常了,牛!!!
@KaiWuBOSS 使用 V0.26 版本是正常的,使用 V0.31 版本就出现报错了,请大神看看

本地大模型部署器 vv0.3.1 llama.cpp b8864
by llmbbs.ai 本地 AI 技术社区

[1/6] Probing hardware...
GPU: NVIDIA GeForce RTX 4060 Laptop GPU (SM89, 8188 MB VRAM, 272 GB/s)
RAM: 95 GB DDR5
OS: windows amd64

[2/6] Selecting configuration...
Model: Qwen3.6-35B-A3B (moe, 38B total / 1B active)
Quant: Q4_K_M (20.6 GB)
Mode: moe_partial
Accel: Flash Attention + SWA-Full (hybrid arch)

[3/6] Checking files...
Using bundled iso3 binary: llama-server-cuda.exe
Binary: llama-server-cuda.exe [cached]
Model: Qwen3.6-35B-A3B-UD-Q4_K_M.gguf [cached]

[4/6] Preflight check...
VRAM sufficient

[5/6] Warmup benchmark...
旧缓存格式,重新探测
Probe 1: ctx=256K ... OOM
Probe 2: ctx=128K ... OOM
Probe 3: ctx=64K ... OOM
Probe 4: ctx=32K ... OOM
Probe 5: ctx=16K ... OOM
Probe 6: ctx=8K ... OOM
Warmup failed: all ctx probes failed (tried down to 4K)
Using default parameters

[6/6] Starting server...
Waiting for llama-server to be ready (port 11434)...
显存不足,降低上下文至 4K 重试...
Waiting for llama-server to be ready (port 11434)...
Error: failed to start llama-server: 连续 2 次启动失败,即使最小上下文(4K)也无法运行

NVIDIA GeForce RTX 4060 Laptop GPU: 8188 MB VRAM
模型 Qwen3.6-35B-A3B: ~21104 MB
KV cache (4K, q4_0): ~80 MB
预估总需: ~22208 MB

差额: 14020 MB

建议:
1. 选择更小的量化 (Q4_K_M 或 Q2_K)
2. 选择更小的模型

Usage:
kaiwu run <model> [flags]

Flags:
--bench Run benchmark after starting
--ctx-size int 手动指定上下文大小( 0=自动)
--fast Skip warmup, use cached profile
-h, --help help for run
--host string 监听地址(默认 127.0.0.1 ,用 0.0.0.0 开放局域网) (default "127.0.0.1")
--llama-server string 使用自定义 llama-server 二进制(完整路径)
--mode string 模式选择: speed/balanced/context (默认用上次选择)
--reset 清除缓存,重新 warmup 探测最优参数
做得不错,提点建议。

在显存为 8G 的 4060 GPU 上跑 Qwen 3.6 35B MoE 模型,通过 offload 方式可以实现吞吐量在 20 token/s 以上,但网页提示的信息似乎有出入,请确认。
加油,腾讯公众号内容也覆盖上
Mar 28, 2023
Replied to a topic by poyanhu 推广 GPT4 上线,送 30 张 199 次对话卡密
MTg4MDM3OTZAcXEuY29t
About     Help     Advertise     Blog     API     FAQ     Solana     3207 Online   Highest 6679       Select Language
创意工作者们的社区
World is powered by solitude
VERSION: 3.9.8.5 28ms UTC 12:33 PVG 20:33 LAX 05:33 JFK 08:33
Do have faith in what you're doing.
ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86