Rentry的易讀表格版本：https://rentry.co/5utrg5cy

LLM性能測試

硬體

GMKTek K8 Plus
CPU: AMD Ryzen 8845HS
RAM: 64GB DDR5-5600 SODIMM（雙通道）
GPU: Radeon 780m 8GB（透過BIOS設置VRAM大小）
OS: Ubuntu 24.04

系統配置調整：

/etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="amdgpu.gttsize=49152 ttm.pages_limit=12582912"
主要是讓單一程序可以使用最多48GB的記憶體。

測試軟體：llama-benchy 0.3.7
指令
llama-benchy --base-url <api url> --model zerofata/G4-MeroMero-26B-A4B --depth 0 4096 8192 16384 32768 --tg 128 --latency-mode generation --enable-prefix-caching
測試模型：G4-MeroMero-26B-A4B Q5_K_M量化版本（Gamma 4 26B A4B的微調版本）

測試標的

koboldcpp-1.112.2

啟動指令
./koboldcpp-linux-x64-nocuda --model ./G4-MeroMero-26B-A4B-Q5_K_M.gguf --host 0.0.0.0 --threads 7 --usevulkan 0 --blasbatchsize 2048 --gpulayers 49 --contextsize 32768 --flashattention --skiplauncher --jinja --mmproj ./mmproj-Gemma-4-26b-a4b-f16.gguf --mlock --usemmap --jinjatemplate ./chat_template.jinja

model	test	t/s	peak t/s	ttfr (ms)	est_ppt (ms)	e2e_ttft (ms)
zerofata/G4-MeroMero-26B-A4B	pp2048	419.95 ± 2.39		5288.49 ± 27.66	4879.28 ± 27.66	5288.49 ± 27.66
zerofata/G4-MeroMero-26B-A4B	tg128	18.41 ± 0.04	20.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	ctx_pp @ d4096	384.90 ± 3.16		11054.21 ± 87.02	10644.99 ± 87.02	11054.21 ± 87.02
zerofata/G4-MeroMero-26B-A4B	ctx_tg @ d4096	17.54 ± 0.05	19.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	pp2048 @ d4096	350.57 ± 2.01		6251.27 ± 33.44	5842.05 ± 33.44	6251.27 ± 33.44
zerofata/G4-MeroMero-26B-A4B	tg128 @ d4096	17.30 ± 0.02	19.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	ctx_pp @ d8192	357.27 ± 1.90		23341.93 ± 122.05	22932.71 ± 122.05	23341.93 ± 122.05
zerofata/G4-MeroMero-26B-A4B	ctx_tg @ d8192	16.97 ± 0.02	18.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	pp2048 @ d8192	313.21 ± 0.19		6947.91 ± 3.87	6538.69 ± 3.87	6947.91 ± 3.87
zerofata/G4-MeroMero-26B-A4B	tg128 @ d8192	16.75 ± 0.02	18.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	ctx_pp @ d16384	311.77 ± 1.46		52964.47 ± 245.13	52555.26 ± 245.13	52964.47 ± 245.13
zerofata/G4-MeroMero-26B-A4B	ctx_tg @ d16384	16.19 ± 0.08	17.33 ± 0.47
zerofata/G4-MeroMero-26B-A4B	pp2048 @ d16384	254.26 ± 0.49		8464.01 ± 15.68	8054.79 ± 15.68	8464.01 ± 15.68
zerofata/G4-MeroMero-26B-A4B	tg128 @ d16384	15.94 ± 0.02	17.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	ctx_pp @ d32768	261.11 ± 1.65		125417.69 ± 791.20	125008.48 ± 791.20	125417.69 ± 791.20
zerofata/G4-MeroMero-26B-A4B	ctx_tg @ d32768	14.70 ± 0.02	16.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	pp2048 @ d32768	142.70 ± 1.09		14761.37 ± 109.20	14352.15 ± 109.20	14761.37 ± 109.20
zerofata/G4-MeroMero-26B-A4B	tg128 @ d32768	14.77 ± 0.02	16.00 ± 0.00

llamacpp-rocm b1256

啟動指令
./llamacpp-rocm/llama-server -m ./G4-MeroMero-26B-A4B-Q5_K_M.gguf -ngl 99 -c 32768 --temp 1 --top-k 64 --top-p 0.95 --host 0.0.0.0 -mm ./mmproj-Gemma-4-26b-a4b-f16.gguf --chat-template-file ./chat_template.jinja
在16k的測試出現異常，可能是模型崩潰與重複輸出造成。

model	test	t/s	peak t/s	ttfr (ms)	est_ppt (ms)	e2e_ttft (ms)
zerofata/G4-MeroMero-26B-A4B	pp2048	335.70 ± 14.45		6233.64 ± 258.34	6115.70 ± 258.34	6233.64 ± 258.34
zerofata/G4-MeroMero-26B-A4B	tg128	15.42 ± 0.01	16.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	ctx_pp @ d4096	290.31 ± 12.24		14253.80 ± 580.96	14135.86 ± 580.96	14253.80 ± 580.96
zerofata/G4-MeroMero-26B-A4B	ctx_tg @ d4096	14.07 ± 0.01	15.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	pp2048 @ d4096	238.90 ± 2.19		8691.36 ± 78.59	8573.42 ± 78.59	8691.36 ± 78.59
zerofata/G4-MeroMero-26B-A4B	tg128 @ d4096	13.79 ± 0.01	14.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	ctx_pp @ d8192	260.37 ± 3.20		31589.77 ± 383.96	31471.83 ± 383.96	31589.77 ± 383.96
zerofata/G4-MeroMero-26B-A4B	ctx_tg @ d8192	13.67 ± 0.01	14.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	pp2048 @ d8192	218.36 ± 0.69		9497.17 ± 29.59	9379.23 ± 29.59	9497.17 ± 29.59
zerofata/G4-MeroMero-26B-A4B	tg128 @ d8192	13.55 ± 0.00	14.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	ctx_pp @ d16384	241.87 ± 3.54		67874.56 ± 1000.78	67756.62 ± 1000.78	67874.56 ± 1000.78
zerofata/G4-MeroMero-26B-A4B	ctx_tg @ d16384	66.75 ± 0.02	70.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	pp2048 @ d16384	204.84 ± 2.69		10117.85 ± 130.36	9999.91 ± 130.36	10117.85 ± 130.36
zerofata/G4-MeroMero-26B-A4B	tg128 @ d16384	66.23 ± 0.04	69.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	ctx_pp @ d32768	191.92 ± 10.78		171424.55 ± 10017.11	171306.61 ± 10017.11	171424.55 ± 10017.11
zerofata/G4-MeroMero-26B-A4B	ctx_tg @ d32768	12.37 ± 0.07	13.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	pp2048 @ d32768	142.47 ± 3.70		14502.40 ± 380.91	14384.46 ± 380.91	14502.40 ± 380.91
zerofata/G4-MeroMero-26B-A4B	tg128 @ d32768	12.34 ± 0.00	13.00 ± 0.00

llamacpp b8999 (vulkan backend)

啟動指令
./llama-b8999/llama-server -m ./G4-MeroMero-26B-A4B-Q5_K_M.gguf -ngl 99 -c 49152 --temp 1 --top-k 64 --top-p 0.95 --host 0.0.0.0 -mm ./mmproj-Gemma-4-26b-a4b-f16.gguf --chat-template-file ./chat_template.jinja
在32k的測試出現異常，可能是模型崩潰與重複輸出造成。

model	test	t/s	peak t/s	ttfr (ms)	est_ppt (ms)	e2e_ttft (ms)
zerofata/G4-MeroMero-26B-A4B	pp2048	315.62 ± 12.25		6633.52 ± 244.36	6502.53 ± 244.36	6633.52 ± 244.36
zerofata/G4-MeroMero-26B-A4B	tg128	20.70 ± 0.02	21.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	ctx_pp @ d4096	291.80 ± 10.49		14189.23 ± 493.34	14058.24 ± 493.34	14189.23 ± 493.34
zerofata/G4-MeroMero-26B-A4B	ctx_tg @ d4096	19.54 ± 0.00	20.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	pp2048 @ d4096	275.22 ± 1.81		7572.64 ± 49.08	7441.65 ± 49.08	7572.64 ± 49.08
zerofata/G4-MeroMero-26B-A4B	tg128 @ d4096	19.53 ± 0.01	20.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	ctx_pp @ d8192	286.29 ± 1.35		28747.85 ± 134.81	28616.86 ± 134.81	28747.85 ± 134.81
zerofata/G4-MeroMero-26B-A4B	ctx_tg @ d8192	19.68 ± 0.03	20.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	pp2048 @ d8192	252.09 ± 0.21		8254.93 ± 6.84	8123.94 ± 6.84	8254.93 ± 6.84
zerofata/G4-MeroMero-26B-A4B	tg128 @ d8192	18.64 ± 0.03	19.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	ctx_pp @ d16384	269.11 ± 1.28		61018.78 ± 289.55	60887.79 ± 289.55	61018.78 ± 289.55
zerofata/G4-MeroMero-26B-A4B	ctx_tg @ d16384	18.08 ± 0.00	19.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	pp2048 @ d16384	217.98 ± 1.58		9526.90 ± 67.69	9395.91 ± 67.69	9526.90 ± 67.69
zerofata/G4-MeroMero-26B-A4B	tg128 @ d16384	18.01 ± 0.02	19.00 ± 0.00
zerofata/G4-MeroMero-26B-A4B	ctx_pp @ d32768	231.32 ± 0.68		141795.99 ± 417.40	141665.00 ± 417.40	141795.99 ± 417.40
zerofata/G4-MeroMero-26B-A4B	ctx_tg @ d32768	31.67 ± 20.37	41.67 ± 33.47
zerofata/G4-MeroMero-26B-A4B	pp2048 @ d32768	168.40 ± 1.35		12292.98 ± 97.49	12161.99 ± 97.49	12313.51 ± 124.96
zerofata/G4-MeroMero-26B-A4B	tg128 @ d32768	16.47 ± 0.01	17.00 ± 0.00

綜合測試結論（使用Gemini做的分析

1. 硬體潛力與系統優化

記憶體配置與優化： 透過調整 Linux 核心參數（GRUB）將 amdgpu.gttsize 設為 48GB，是成功在此類 iGPU 設備上順暢執行 26B 大模型量化版的關鍵。64GB 的實體記憶體為 780M 提供了充足的空間來處理大型模型文件及高達 32K 的 Context 需求。
推論效能： 整體而言，在 8845HS 的平台上，26B Q5 模型能達到約 15~20 t/s 的生成速度，對於單人使用情境已具備極高的實用性，接近一般人的閱讀速度。

2. 後端軟體性能對比 (Backend Comparison)

測試項目	Koboldcpp-1.112.2	Llamacpp (Vulkan)	Llamacpp-rocm
提示處理 (Prompt Processing)	最快 (~420 t/s)	中等 (~315 t/s)	較慢 (~335 t/s)
權杖生成 (Token Gen)	穩定 (~18.4 t/s)	最高 (~20.7 t/s)	較慢 (~15.4 t/s)
長文本穩定性 (Stability)	極高，隨 Context 增加性能衰減平緩。	高 Context (32k) 時出現數據異常。	中 Context (16k) 時出現異常。

Koboldcpp： 在本次測試中表現最為均衡且可靠。其 Prompt Processing 速度大幅領先，且在高 Context (32768) 下依然維持穩定的生成速度（14.7 t/s），沒有出現模型崩潰或邏輯異常，是長文本應用的首選。
Llamacpp (Vulkan)： 提供了最高的初始生成速度（超過 20 t/s），但在 Context 達到 32k 時出現數據劇烈波動與異常（Variance 較大），顯示在極端上下文負荷下驅動或後端尚不夠穩定。
Llamacpp-rocm： 在此硬體配置下表現差強人意，不僅生成速度最低，且在 16k Context 時便提早出現模型異常，推測 ROCm 在此 APU 上的優化或記憶體管理仍有改進空間。

3. 測試異常觀察

在 llamacpp-rocm (16k) 與 vulkan (32k) 的測試中，出現了速度異常（如跳升至 66 t/s）或誤差值過大的現象，這通常與模型崩潰、重複輸出 (Repetition) 或 K-V Cache 溢位有關。這指出在 iGPU 環境下進行超長文本推論時，後端軟體的穩定性 (Robustness) 比純粹的峰值速度更為重要。

總結建議

對於使用 AMD Ryzen 8000 系列 APU 的用戶，若要執行 26B 規模 的模型：

推薦後端： 優先選用 Koboldcpp，其在長文本處理的穩定度與 Prompt 處理速度上具有明顯優勢。
效能追求： 若僅進行短文本對話，可嘗試 Llamacpp (Vulkan) 以獲取最高生成速度。
環境設定： 務必修改系統核心參數以釋放顯存限制，否則無法充分發揮 64GB 記憶體的硬體優勢。

昨日東風

2026年5月2日星期六

Ryzen 8845HS w/ Radeon 780M的LLM性能測試

LLM性能測試

硬體

系統配置調整：

測試標的

koboldcpp-1.112.2

llamacpp-rocm b1256

llamacpp b8999 (vulkan backend)

綜合測試結論（使用Gemini做的分析

1. 硬體潛力與系統優化

2. 後端軟體性能對比 (Backend Comparison)

3. 測試異常觀察

總結建議

沒有留言:

張貼留言

2026年5月2日 星期六

Ryzen 8845HS w/ Radeon 780M的LLM性能測試

LLM性能測試

硬體

系統配置調整：

測試標的

koboldcpp-1.112.2

llamacpp-rocm b1256

llamacpp b8999 (vulkan backend)

綜合測試結論（使用Gemini做的分析

1. 硬體潛力與系統優化

2. 後端軟體性能對比 (Backend Comparison)

3. 測試異常觀察

總結建議

沒有留言:

張貼留言

2026年5月2日星期六