V2EX ahdw
Web  1 sites indexed in VXNA
12 articles feed address
 ahdw's recent timeline updates
ahdw

ahdw

V2EX member #328218, joined on 2018-07-10 16:05:54 +08:00
Today's activity rank 24469
Per ahdw's settings, the topics list is hidden
Deals info, including closed deals, is not hidden
ahdw's recent replies
建议看看 oMLX 的社区评测,不要用 llama.cpp ,浪费苹果硬件
无法复现。
「然而并没有」节点预备。
@Hermitist Qwen3.5-9B 是我玩过智力最强的小模型了,你还觉得蠢?具体是什么场景? 9B 及以下的模型,我就没见过比它聪明的……
OpenCode Go 量好少,1 天之内用光 2 次 5 小时额度的话,就相当于用完了 1 周的额度,和半个月的额度。
这样站起来蹬的话,一个月的额度也就能用 3 天,中间还隔一个星期。
但是好处是 Go 套餐里面用各家的模型收费基本一致,它没怎么加价。GLM 5.1 当个低配 Opus4.6 用还是可以的。

Go 套餐 + 充值(美元)可以用 GLM 5.1
现在 DeepSeek V4 Pro 性价比也很高,充钱用 API 当 Sisyphus Agent 主力用,搞不定切到 Go 里面的 GLM ,还是很强的。
你试试 Qwen3.5-9B ,很惊喜。

我在闲置的 16GB RAM M1 Pro MBP 上面,用 oMLX (已经支持 TurboQuant 给 KV Cache 量化了,我用的 4bit )跑混合量化的 Qwen3.5-9B-MLX-OptiQ-4Bit 版本,能坚持到 45K context 才 OOM ,使用的时候确保 context 不超 32K ,留出 10K 给它 thinking 和 reply ,还能有 2-3K 的 buffer 。

速度的话,我也跑了 benchmark:
1k -> PP 136.6 TG 28.4 tok/s
4k -> PP 140.1 TG 27.4 tok/s
8k -> PP 139.3 TG 26.2 tok/s
16k -> PP 136.7 TG 23.9 tok/s
32k -> PP 131.1 TG 20.1 tok/s

你的设备应该不会比我这台机器更古早了,性能应该更好。

智力的话,我也做了一些测试:
1. 农夫过桥问题不需要思考模式就能秒答对
2. 洗车问题需要思考大约 200-300s ,但是每次都能答对
3. 猎人打鸟问题短暂思考就能答对
4. 给飞机跑道装跑步机问飞机能不能起飞,思考后能答对
5. 9.11 和 9.9 哪个大,短暂思考后答对

还有很多此类问题,没有翻车的。就是思考时间比较长。

我也 benchmark 了单一量化版的,Qwen3.5-9B-MLX-4Bit, 速度能稍微快一点:
1k -> PP 136.3 TG 30.3 tok/s
4k -> PP 140.0 TG 29.1 tok/s
8k -> PP 139.3 TG 27.9 tok/s
16k -> PP 136.7 TG 25.3 tok/s
32k -> PP 131.1 TG 21.2 tok/s

想了一下,觉得混合精度可能在长上下文的时候,某些 edge case 表现更稳,这一点速度损失可以承受。

我也尝试了用 llama.cpp ,打开 metal 优化编译,再配合 TurboQuant+,结果完全不如 oMLX ,跑不赢一点。Context Window 可以大一些,但是速度太慢了,16K 以上就掉到 10 tok/s 以下了。
都 4 月中了,泡温泉不嫌热吗?
@oldlamp 加钱即可满足速度和质量双全,直接上 512GB 统一内存的 Mac Studio ,哈哈

唉,世上安得三全法?
Apr 14
Replied to a topic by ahdw Local LLM 闲置 16GB M1 Pro MBP 跑大模型
```
main: loading model
srv load_model: loading model '/path/to/TurboQuant/models/gemma-4-E4B-it-Q4_K_M.gguf'
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
llama_params_fit_impl: projected to use 11441 MiB of device memory vs. 14199 MiB of free device memory
llama_params_fit_impl: will leave 2757 >= 1024 MiB of free device memory, no changes needed
llama_params_fit: successfully fit params to free device memory
llama_params_fit: fitting params to free memory took 0.39 seconds
llama_model_load_from_file_impl: using device MTL0 (Apple M1 Pro) (unknown id) - 14199 MiB free

print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_K - Medium
print_info: file size = 4.62 GiB (5.28 BPW)

load_tensors: CPU_Mapped model buffer size = 360.00 MiB
load_tensors: MTL0_Mapped model buffer size = 4731.51 MiB

llama_context: n_ctx_seq (49152) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Pro
ggml_metal_init: picking default device: Apple M1 Pro
ggml_metal_init: use fusion = true
ggml_metal_init: use cOncurrency= true
ggml_metal_init: use graph optimize = true
llama_context: CPU output buffer size = 1.00 MiB
llama_kv_cache_iswa: creating non-SWA KV cache, size = 49152 cells
llama_kv_cache: MTL0 KV buffer size = 306.00 MiB
llama_kv_cache: size = 306.00 MiB ( 49152 cells, 4 layers, 1/1 seqs), K (q8_0): 204.00 MiB, V (turbo4): 102.00 MiB
llama_kv_cache: upstream attention rotation disabled (TurboQuant uses kernel-level WHT)
```
About     Help     Advertise     Blog     API     FAQ     Solana     1086 Online   Highest 6679       Select Language
创意工作者们的社区
World is powered by solitude
VERSION: 3.9.8.5 13ms UTC 22:58 PVG 06:58 LAX 15:58 JFK 18:58
Do have faith in what you're doing.
ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86