CFM880's recent timeline updates

CFM880

V2EX member #196454, joined on 2016-10-16 20:05:41 +08:00

Today's activity rank 9603

www.cfm880.com

CFM880 提问技术话题好玩工作信息交易信息城市相关

Per CFM880's settings, the topics list is hidden

Deals info, including closed deals, is not hidden

CFM880's recent replies

17 days ago

Replied to a topic by yanfulives Claude Code 现在还能注册 Claude 吗

别想不开充钱就行

Apr 29

Replied to a topic by KaiWuBOSS Local LLM 能一起给本地部署的开源模型做个适配的 coding agent 吗？我憋了口气

老哥，之前我看到过一个论文 Debug2Fix: Supercharging Coding Agents with Interactive Debugging Capabilities ，把原生 debug 做出来，就厉害了，虽然说打 log 也能解决问题，但是一个不断试错，不断反复收集运行环境上下文的过程；但是 debug 能完全掌握运行环境，完整的运行上下文，对小模型也是非常有用的，友好的；这应该是下一代 code agent

Apr 26

Replied to a topic by KaiWuBOSS Local LLM 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

@KaiWuBOSS 我试了一下 codex cli 中对话 OK 了，但是 agent 调用工具不行（比如读文件)，应该我模型不行吧

Apr 25

Replied to a topic by KaiWuBOSS Local LLM 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

■ unexpected status 404 Not Found: 404 page not found, url: http://127.0.0.1:11435/responses

Apr 25

Replied to a topic by KaiWuBOSS Local LLM 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

PS C:\Users\cfm880\Downloads> kaiwu run .\Qwen3-Coder-30B-APEX-I-Quality.gguf

本地大模型部署器 vv0.1.6 llama.cpp b8864
by llmbbs.ai 本地 AI 技术社区

[1/6] Probing hardware...
GPU: NVIDIA GeForce RTX 3080 Laptop GPU (SM86, 16384 MB VRAM, 760 GB/s)
RAM: 63 GB DDR4
OS: windows amd64
CUDA 13.2 detected known bug with low-bit quantization
If you see garbled output, downgrade driver to CUDA 13.1

[2/6] Selecting configuration...
Model: Qwen3-Coder-30B-A3B-Instruct (moe, 22B total / 1B active)
Quant: Q6_K (18.1 GB)
Mode: moe_offload (experts on CPU)
Accel: Flash Attention

[3/6] Checking files...
Using bundled iso3 binary: llama-server-cuda.exe
Binary: llama-server-cuda.exe [cached]
Model: Qwen3-Coder-30B-APEX-I-Quality.gguf [cached]

[4/6] Preflight check...
VRAM sufficient

[5/6] Warmup benchmark...
Probe 1: ctx=128K ... 13.6 tok/s (< 18, too slow)
Probe 2: ctx=64K ... 14.6 tok/s (< 18, too slow)
Probe 3: ctx=32K ... 15.7 tok/s (< 18, too slow)
Probe 4: ctx=16K ... 14.8 tok/s (< 18, too slow)
Probe 5: ctx=8K ... 13.8 tok/s (< 18, too slow)
Tune ubatch: ub=128 → 14.5 tok/s; ub=512 → 14.6 tok/s;
14.6 tok/s @ 32K ctx
Saved profile: C:\Users\cfm880\.kaiwu\profiles\qwen3-coder-30b-apex-i-quality_sm86_16384mb_ddr4.json
14.6 tok/s

[6/6] Starting server...
Waiting for llama-server to be ready (port 11434)...
llama-server started (PID 17428, port 11434)
Kaiwu proxy started (port 11435)

┌─────────────────────────────────────────────────┐
2026/04/25 14:35:09 Kaiwu proxy listening on :11435 → llama-server :11434
│ Ready Qwen3-Coder-30B-A3B-Instruct @ 14.6 tok/s │
│ API:http://127.0.0.1:11435/v1/chat/completions │
│ 模型文件夹: C:\Users\cfm880\.kaiwu\models │
└─────────────────────────────────────────────────┘

运行 kaiwu inject 接入 IDE Ctrl+C 停止
─ 实时监控空载 ─────────────────── 每 2s 刷新 ─
reuse:1024 KV:f16 32K ctx ub512 mlock
速度显存内存 GPU 温度
tok/s 6.4/16 GB 30.4/64 GB 0% 50°CC
[..........] [====......] [====......] [..........] [=====.....]
─────────────────────────────────────────────────────────
上下文 [....................] 0.0K / 32K 余 32.0K

Feb 12

Replied to a topic by rcj6056 Android 学习 systrace 分析

放 android studio 里，看火焰图，找最耗时的函数，看看调用栈，还有就是调用最多次数的函数

Feb 4

Replied to a topic by mercury233 硬件体验 ARM 桌面，目前千元以内最好的方案是什么？

小米平板 5

More replies by CFM880

About Help Advertise Blog API FAQ Solana 1060 Online Highest 6679

Select Language

创意工作者们的社区

World is powered by solitude

VERSION: 3.9.8.5 12ms UTC 18:36 PVG 02:36 LAX 11:36 JFK 14:36
Do have faith in what you're doing.