hongdengdao's recent timeline updates

hongdengdao

V2EX member #42707, joined on 2013-07-27 15:19:45 +08:00

Today's activity rank 12632

hongdengdao 提问技术话题好玩工作信息交易信息城市相关

evernote, pocket,wunderlist,lastpass,newyork times 一年订阅优惠打包只要 59.9 美元。。

分享发现 hongdengdao Jan 15, 2015 Lastly replied by codejay

中国意对公民境外收入征税,大家怎么看

分享发现 hongdengdao Jan 11, 2015 Lastly replied by sadaharu09

柏林墙倒塌的纪念日，谷歌搜索的 doodle 不错!!

Google hongdengdao Nov 10, 2014 Lastly replied by skydiver

大家对下列新闻怎么看？屁民是不是以后只能拿擦屁股纸了？

问与答 hongdengdao Jul 10, 2014 Lastly replied by sesinx

line 被屏蔽，局域网进度+1

LINE hongdengdao Jul 3, 2014 Lastly replied by simon7

stackoverflow 也被墙了....，大家也一样么？

程序员 hongdengdao Oct 13, 2014 Lastly replied by standin000

各位团购 sketch 的 v2exer 能更新到 3.0.2 么？我的每次都在 3.0.1 循环，每次更新完都还是 3.0.1

macOS hongdengdao May 20, 2014 Lastly replied by Superoutman

More topics by hongdengdao

hongdengdao's recent replies

Apr 24

Replied to a topic by KaiWuBOSS Local LLM 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

这个应该有点问题,双卡 32g 显存, 8k-64k 肯定是可以运行的

.\kaiwu.exe run Qwen3.6-27B-Q4_K_M.gguf

本地大模型部署器 vv0.1.2 llama.cpp b8864
by llmbbs.ai 本地 AI 技术社区

[1/6] Probing hardware...
GPU: NVIDIA GeForce RTX 4060 Ti × 2 (SM89, 16380 MB VRAM each, 0 GB/s)
RAM: 61 GB DDR5
OS: windows amd64

[2/6] Selecting configuration...
Model: Qwen3.6-27B (dense, 28B)
Quant: Q4_K_M (15.7 GB)
Mode: full_gpu
Accel: Flash Attention

[3/6] Checking files...
Using bundled iso3 binary: llama-server-cuda.exe
Binary: llama-server-cuda.exe [cached]
Model: Qwen3.6-27B-Q4_K_M.gguf [cached]

[4/6] Preflight check...
VRAM sufficient

[5/6] Warmup benchmark...
Probe 1: ctx=256K ... OOM
Probe 2: ctx=128K ... OOM
Probe 3: ctx=64K ... OOM
Probe 4: ctx=32K ... OOM
Probe 5: ctx=16K ... OOM
Probe 6: ctx=8K ... OOM
Warmup failed: all ctx probes failed (tried down to 4K)
Using default parameters

[6/6] Starting server...
llama-server 不支持 iso3 ，回退到 q8_0/q4_0
Waiting for llama-server to be ready (port 11434)...
显存不足，降低上下文至 64K 重试...
Waiting for llama-server to be ready (port 11434)...
显存不足，降低上下文至 32K 重试...
Waiting for llama-server to be ready (port 11434)...
Error: failed to start llama-server: 3 次启动均失败，建议选择更小的模型
Usage:
kaiwu run <model> [flags],

Apr 24

Replied to a topic by KaiWuBOSS Local LLM 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

.\kaiwu.exe run Qwen3.6-27B-Q4_K_M.gguf

本地大模型部署器 vv0.1.1 llama.cpp b8864
by llmbbs.ai 本地 AI 技术社区

[1/6] Probing hardware...
GPU: NVIDIA GeForce RTX 4060 Ti (SM89, 16380 MB VRAM, 0 GB/s)
RAM: 61 GB DDR5
OS: windows amd64

[2/6] Selecting configuration...
Model: Qwen3.6-27B (dense, 28B)
Quant: Q4_K_M (15.7 GB)
Mode: full_gpu
Accel: Flash Attention

[3/6] Checking files...
Using bundled iso3 binary: llama-server-cuda.exe
Binary: llama-server-cuda.exe [cached]
Model: Qwen3.6-27B-Q4_K_M.gguf [cached]

[4/6] Preflight check...
VRAM sufficient

[5/6] Warmup benchmark...
Probe 1: ctx=8K ... OOM
Probe 2: ctx=4K ... OOM
Warmup failed: all ctx probes failed (tried down to 4K)
Using default parameters

[6/6] Starting server...
llama-server 不支持 iso3 ，回退到 q8_0/q4_0
Waiting for llama-server to be ready (port 11434)...
显存不足，降低上下文至 4K 重试...
Waiting for llama-server to be ready (port 11434)...
Error: failed to start llama-server: 连续 2 次启动失败，即使最小上下文(4K)也无法运行
建议：选择更小的量化或使用 MoE offload 模型
Usage:
kaiwu run <model> [flags]

Apr 24

Replied to a topic by KaiWuBOSS Local LLM 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现