V2EX kevan
 kevan's recent timeline updates
6t6t.com 风之轩 Easy Listening
kevan

kevan

6t6t.com 风之轩 Easy Listening ...
V2EX member #21701, joined on 2012-05-31 17:34:09 +08:00
出京东 plus 2 小时家政 40 Y2tja2xvaw==
二手交易    kevan    8 days ago    Lastly replied by kevan
9
6T6T.COM 出
域名    kevan    Apr 10    Lastly replied by yiihub
2
出京东 plus 家政 2 小时
二手交易    kevan    Feb 2    Lastly replied by imsoso
1
出 JD 家政 2 小时
二手交易    kevan    Dec 9, 2025
Hi, 欢迎光临我的音乐小站 风之轩 Easy Listening
音乐    kevan    May 31, 2012    Lastly replied by kevan
16
kevan's recent replies
@iovekkk 我也是这个配置,跑 27b 都费劲.要想流畅体验只能 9B
还是用不了
@KaiWuBOSS 下班回家马上试用反馈,我跟进了好久,哈哈哈,必须支持
能优化一个 50 系能用的版本吗?
@KaiWuBOSS 大佬,为什么我下载 0.1.6 版本还是不行啊??????

>kaiwu run Qwen3-30B-A3B-UD-Q3_K_XL.gguf --reset







本地大模型部署器 vv0.1.6 llama.cpp b8864
by llmbbs.ai 本地 AI 技术社区

[1/6] Probing hardware...
GPU: NVIDIA GeForce RTX 5070 Ti (SM120, 16303 MB VRAM, 896 GB/s)
RAM: 31 GB UNKNOWN
OS: windows amd64
CUDA 13.2 detected known bug with low-bit quantization
If you see garbled output, downgrade driver to CUDA 13.1
Warning: RTX 50 series with CUDA 13.2 detected
Kaiwu will use CUDA 12.4 binary for stability.

[2/6] Selecting configuration...
Model: Qwen3-30B-A3B (moe, 29B total / 2B active)
Quant: Q3_K_M (12.9 GB)
Mode: full_gpu
Accel: Flash Attention

[3/6] Checking files...
Using bundled iso3 binary: llama-server-cuda.exe
Binary: llama-server-cuda.exe [cached]
Model: Qwen3-30B-A3B-UD-Q3_K_XL.gguf [cached]

[4/6] Preflight check...
RTX 50 系首次启动需要 JIT 编译 (~30s),请稍候...
llama-server 不支持 iso3 ,回退到 q8_0/q4_0
VRAM sufficient

[5/6] Warmup benhmark...
已清除缓存,重新探测
Probe 1: ctx=8K ... OOM
Probe 2: ctx=4K ... OOM
Warmup failed: all ctx probes failed (tried down to 4K)
Using default parameters

[6/6] Starting server...
Waiting for llama-server to be ready (port 11434)...
显存不足,降低上下文至 4K 重试...
Waiting for llama-server to be ready (port 11434)...
Error: failed to start llama-server: 连续 2 次启动失败,即使最小上下文(4K)也无法运行

NVIDIA GeForce RTX 5070 Ti: 16303 MB VRAM
模型 Qwen3-30B-A3B: ~13189 MB
KV cache (4K, q4_0): ~96 MB
预估总需: ~14309 MB

建议:
1. 选择更小的量化 (Q4_K_M 或 Q2_K)
2. 选择更小的模型

Usage:
kaiwu run <model> [flags]

Flags:
--bench Run benchmark after starting
--ctx-size int 手动指定上下文大小( 0=自动)
--fast Skip warmup, use cached profile
-h, --help help for run
--llama-server string 使用自定义 llama-server 二进制(完整路径)
--reset 清除缓存,重新 warmup 探测最优参数


C:\Kevan\AI\kaiwu-windows-amd64>
@KaiWuBOSS 本来没什么欲望的,看到你的介绍感觉焕发新生一样,傻瓜式安装,能不能推荐一下具体安装方法,你说的那些太资深
kaiwu.exe run Qwen3-30B-A3B







本地大模型部署器 vv0.1.2 llama.cpp b8864
by llmbbs.ai 本地 AI 技术社区

[1/6] Probing hardware...
GPU: NVIDIA GeForce RTX 5070 Ti (SM120, 16303 MB VRAM, 0 GB/s)
RAM: 31 GB UNKNOWN
OS: windows amd64

[2/6] Selecting configuration...
Model: Qwen3-30B-A3B (moe, 30B total / 3B active)
Quant: ud-q3-k-xl (14.0 GB)
Mode: full_gpu
Accel: Flash Attention + MTP (native)

[3/6] Checking files...
Using bundled iso3 binary: llama-server-cuda.exe
Binary: llama-server-cuda.exe [cached]
Model: Qwen3-30B-A3B-UD-Q3_K_XL.gguf [cached]

[4/6] Preflight check...
llama-server 不支持 iso3 ,回退到 q8_0/q4_0
VRAM sufficient

[5/6] Warmup benchmark...
Probe 1: ctx=8K ... OOM
Probe 2: ctx=4K ... OOM
Warmup failed: all ctx probes failed (tried down to 4K)
Using default parameters

[6/6] Starting server...
Waiting for llama-server to be ready (port 11434)...
显存不足,降低上下文至 4K 重试...
Waiting for llama-server to be ready (port 11434)...
Error: failed to start llama-server: 连续 2 次启动失败,即使最小上下文(4K)也无法运行

NVIDIA GeForce RTX 5070 Ti: 16303 MB VRAM
模型 Qwen3-30B-A3B: ~14336 MB
KV cache (4K, q4_0): ~96 MB
预估总需: ~15456 MB

建议:
1. 选择更小的量化 (Q2_K)
2. 选择更小的模型
3. 使用 MoE offload 模型( experts 放 CPU RAM )
Usage:
kaiwu run <model> [flags]

Flags:
--bench Run benchmark after starting
--ctx-size int 手动指定上下文大小( 0=自动)
--fast Skip warmup, use cached profile
-h, --help help for run
--llama-server string 使用自定义 llama-server 二进制(完整路径)
--reset 清除缓存,重新 warmup 探测最优参数

醉了?怎么和介绍的不一样呢?
介绍说 8GB 显存都能跑,我 16G 显存怎么不行啊?
大佬,问一下我 5070TI 16GB + 32GB 内存 用哪个模型比较合适? 想用来跑小龙虾
Apr 23
Replied to a topic by intoext 业界八卦 大忽悠贾跃亭回国了?
回来保证被围剿
About     Help     Advertise     Blog     API     FAQ     Solana     5168 Online   Highest 6679       Select Language
创意工作者们的社区
World is powered by solitude
VERSION: 3.9.8.5 27ms UTC 05:43 PVG 13:43 LAX 22:43 JFK 01:43
Do have faith in what you're doing.
ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86